soedinglab / plass

sensitive and precise assembly of short sequencing reads
https://plass.mmseqs.com
GNU General Public License v3.0
145 stars 13 forks source link

Paired read prediction - mergereads failed #38

Open dnolin13 opened 2 years ago

dnolin13 commented 2 years ago

Expected Behavior

Hello, I am trying to run PLASS on a curated set of marine viral metagenomic reads. I have two read files, and I am trying to run PLASS on them but I am getting the following error:

Start merging reads. Segmentation fault (core dumped) Error: mergereads failed deactivate does not accept arguments remainder_args: ['PLASS']

Current Behavior

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

Here is the script I am using for the plass assembler:

conda activate PLASS

/home/delaney/miniconda3/envs/PLASS/bin/plass assemble /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted1.fq /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted2.fq assembly.fas tmp

conda deactivate PLASS

Plass Output (for bugs)

Please make sure to also post the complete output of Plass. You can use gist.github.com for large output.

Include only extendable true Skip repeating k-mers true Min codons in orf 45 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Protein Filter Threshold 0.2 Filter Proteins 1 Search iterations 12 Delete temporary files incremental 1 Remove temporary files false MPI runner
Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1

PAIRED END MODE mergereads /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted1.fq /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted2.fq /home/delaney/5YV/test2/scripts/tmp/1996441830643183315/nucl_reads -v 3

Start merging reads. Segmentation fault (core dumped) Error: mergereads failed deactivate does not accept arguments remainder_args: ['PLASS']

Context

Providing context helps us come up with a solution and improve our documentation for the future.

these are viral metagenomic reads sequenced on a novaseq. they were identified as viral using kraken2 and the reads from my dataset that were viral were then put into the 2 files extracted1.fq and extracted2.fq (for fwd and rev).

Your Environment

I am running this in a conda environment, where i installed plass using bioconda on a linux machine.

AnnSeidel commented 2 years ago

is it possible to provide us your input reads? My first guess would be some problem with the quality strings within the fastq files. Which Plass version are you using?

dnolin13 commented 2 years ago

Sure, I can provide you a few of the reads if that would help. I had to attach the .fq files s a txt file to get them here.

In terms of the version of plass, I'm not entirely sure, but I downloaded it last week using the bioconda install. Thanks for the help! subsetForGithubRev.txt subsetForGithub.txt

AnnSeidel commented 2 years ago

you mentioned .fq files, but the read files you provide are in fasta format. Plass utilize the FLASH code for merging paired end reads in the first step, however FLASH needs the quality string of the fastq file format to merge reads. It fails without such a line within the input files, if you provide multiple files. In the current state of the code on GitHub the same error would give you the error message "Invalid sequence record found".

If you have the chance to get the quality strings, you can call Plass with the two paired-end files in fastq format. If not, you can provide Plass with a single file in fasta format. Therefore, you can either use another tool to merge your paired-end reads before (if there is a one that can work without the quality string) or simple concat your files together without making use of the pairing information.