jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
357 stars 78 forks source link

Running flye on fasta.gz #626

Closed megberryman closed 1 year ago

megberryman commented 1 year ago

4413_Log.zip syslog.zip

I am trying to run 1.6.0 with fasta.gz as the input files. There is an error when running flye.

Any idea what's going on? Is the error coming from having a fasta.gz as the input?

Preparing files for pair1: cat /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/fasta_files/AllCombined5000.fasta.gz > /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/raw_fastq/par1.fasta.gz Running assembly with flye: perl /apps/squeezemeta/1.6.0/SqueezeMeta/lib/SqueezeMeta/assembly_flye.pl /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000 4413_5000 /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/raw_fastq/par1.fasta.gz /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/raw_fastq/ mv: cannot stat ‘/blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/flye/assembly.fasta’: No such file or directory Error running command: /apps/squeezemeta/1.6.0/SqueezeMeta/bin/Flye-2.9/bin/flye --meta -o /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/flye --plasmids --meta --genome-size 2.6g --threads 90 --nano-raw /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/raw_fastq/par1.fasta.gz > /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/syslog 2>&1; mv /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/flye/assembly.fasta /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_5000/data/flye/contigs.fasta at /apps/squeezemeta/1.6.0/SqueezeMeta/lib/SqueezeMeta/assembly_flye.pl line 36.

jtamames commented 1 year ago

Hello You can find this report from Flye in your syslog:

[2023-02-13 16:09:56] ERROR: The input contain reads with duplicated IDs.

That is the likely cause for the crash. Could you check if your input files contain reads with the same ID? Best, J

megberryman commented 1 year ago

Thank you so much for your fast response. We were able to rectify that original problem but we seem to have come up with a new one.

We are attempting to run SqueezeMeta on fastqs chopped to different lengths (<=5000 bp, <=2500 bp, <=1000 bp, <=500 bp, and <=250 bp) in separate batches with the intent of comparing taxa/function outputs based on max read lengths.

However, the program runs fine with 5000bp and 2500bp and then quits for 1000bp and shorter.

The error is occurring during flye. Is the problem with the length? Is there a workaround?

syslog_02212023.txt 4413short_Log.txt

fpusan commented 1 year ago

Hi! It may or may not be a problem with the lengths. Try running

/apps/squeezemeta/1.6.0/SqueezeMeta/bin/Flye-2.9/bin/flye --meta -o /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_1000/data/flye --plasmids --meta --genome-size 2.6g --threads 12 --nano-raw /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_1000/data/raw_fastq/par1.fasta.gz > /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_1000/syslog 2>&1; mv /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_1000/data/flye/assembly.fasta /blue/microbiology-dept/triplett-lab/ABIS_METAGENOMES/Chopped_4413/4413_1000/data/flye/contigs.fasta

to get the exact error produced by flye.

megberryman commented 1 year ago

syslog.txt [2023-02-27 15:44:06] INFO: Assembled 0 disjointigs [2023-02-27 15:44:10] INFO: Generating sequence [2023-02-27 15:44:10] INFO: Filtering contained disjointigs [2023-02-27 15:44:10] INFO: Contained seqs: 0 [2023-02-27 15:44:18] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2023-02-27 15:44:18] ERROR: Pipeline aborted

flye_log.txt [2023-02-27 15:44:18] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2023-02-27 15:44:18] root: ERROR: Pipeline aborted

I presume we need to change the parameters. Any suggestions?

jtamames commented 1 year ago

Hello

I cannot see the logs (very poor connectivity) but I guess the sequencing depth is not very high. In these circumstances the assembly will be very poor. Wouldn't it be better to analyze reads directly, using sqm_longreads.pl?

Best,

J

On 28/2/23 16:06, megberryman wrote:

syslog.txt https://github.com/jtamames/SqueezeMeta/files/10851630/syslog.txt [2023-02-27 15:44:06] INFO: Assembled 0 disjointigs [2023-02-27 15:44:10] INFO: Generating sequence [2023-02-27 15:44:10] INFO: Filtering contained disjointigs [2023-02-27 15:44:10] INFO: Contained seqs: 0 [2023-02-27 15:44:18] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2023-02-27 15:44:18] ERROR: Pipeline aborted

flye_log.txt https://github.com/jtamames/SqueezeMeta/files/10851648/flye_log.txt [2023-02-27 15:44:18] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2023-02-27 15:44:18] root: ERROR: Pipeline aborted

I presume we need to change the parameters. Any suggestions?

— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/626#issuecomment-1448348155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIUX7NCCGMLZ6UOTE5C7PDWZYH7NANCNFSM6AAAAAAU2ZDD7U. You are receiving this because you commented.Message ID: @.***>