vinisalazar / metaphor

Metaphor: a general-purpose workflow for assembly and binning of metagenomes
https://metaphor-workflow.readthedocs.io/
Other
38 stars 3 forks source link

mapping issue #82

Open francesco-ricci opened 3 months ago

francesco-ricci commented 3 months ago

Hi VIni,

I keep getting this error message:

[Sat Jul 13 01:49:37 2024] Error in rule concatenate_contigs: jobid: 56 input: output/assembly/megahit/coassembly/coassembly.contigs_renamed.fa output: output/mapping/cobinning/cobinning_contigs_catalogue.fna.gz log: output/logs/mapping/concatenatecontigs/cobinning.log (check log file(s) for error details) conda-env: /home/fricci/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/d2f1a35ff4b889a5c9b33f580fd56824 shell:

concatenate.py -m 1000 output/mapping/cobinning/cobinning_contigs_catalogue.fna.gz output/assembly/megahit/coassembly/coassembly.contigs_renamed.fa &> output/logs/mapping/concatenate_contigs/cobinning.log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

MissingOutputException in rule assembly_report in file /home/fricci/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/rules/assembly.smk, line 103: Job 52 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait: output/assembly/assembly_report/avg_length.png (missing locally, parent dir contents: ) output/assembly/assembly_report/max_length.png (missing locally, parent dir contents: ) output/assembly/assembly_report/median_length.png (missing locally, parent dir contents: ) output/assembly/assembly_report/no_bp.png (missing locally, parent dir contents: ) output/assembly/assembly_report/no_contigs.png (missing locally, parent dir contents: ) output/assembly/assembly_report/n50.png (missing locally, parent dir contents: ) output/assembly/assembly_report/assembly_report.tsv (missing locally, parent dir contents: ) Removing output files of failed job concatenate_contigs since they might be corrupted: output/mapping/cobinning/cobinning_contigs_catalogue.fna.gz Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-07-10T173408.447387.snakemake.log WorkflowError: At least one job did not complete successfully. Traceback (most recent call last): File "/home/fricci/rp24/fra/software/miniconda3/envs/metaphor/bin/metaphor", line 10, in sys.exit(main()) ^^^^^^ File "/home/fricci/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/cli/cli.py", line 278, in main args.func(args) File "/home/fricci/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/cli/execute.py", line 109, in main retcode = run_cmd(cmd) ^^^^^^^^^^^^ File "/home/fricci/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/utils.py", line 92, in run_cmd retcode = check_call(cmd.split()) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/fricci/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['snakemake', '--snakefile', '/home/fricci/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/Snakefile', '--configfile', 'metaphor_set>

This is what the output of logs/mapping/cobinning.log looks like:

Traceback (most recent call last): File "/home/fricci/rp24/fra/software/miniconda3/envs/checkm/bin/concatenate.py", line 42, in vamb.vambtools.concatenate_fasta(filehandle, args.inpaths, minlength=args.minlength, File "/home/fricci/rp24/fra/software/miniconda3/envs/checkm/lib/python3.10/site-packages/vamb/vambtools.py", line 555, in concatenate_fasta for entry in entries: File "/home/fricci/rp24/fra/software/miniconda3/envs/checkm/lib/python3.10/site-packages/vamb/vambtools.py", line 272, in byte_iterfasta raise ValueError('Empty or outcommented file') ValueError: Empty or outcommented file

Can you help me with this? Thanks Francesco

francesco-ricci commented 3 months ago

Actually I found that for some reason the file output/assembly/megahit/coassembly/coassembly.contigs_renamed.fa was 0 bytes. I am running metaphor again, let's see if it fixes it.

I'll keep you posted!

francesco-ricci commented 3 months ago

Unfortunately, it does not work. Metaphor keeps creating an empty assembly/megahit folder (ratherthan empty, actually files are there but are 0 bytes). Can you please help me troubleshooting this Vini?

vinisalazar commented 3 months ago

Hi @francesco-ricci, this looks like a problem with the assembly. Could you please post the log of the assembly/megahit process here?

Thank you!

francesco-ricci commented 3 months ago

Hey Vini,

thanks for your help. I restarted the full pipeline, if it'll bump into this again I'll post it.

Thanks Francesco

francesco-ricci commented 3 months ago

Hi Vini,

the problem happened again. This is the assembly report:

2024-07-16 07:51:36 - MEGAHIT v1.2.9 2024-07-16 07:51:36 - Using megahitcore with POPCNT and BMI2 support 2024-07-16 07:51:36 - Convert reads to binary library 2024-07-16 07:51:36 - command /fs04/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/f98ddf6c1cb52f3d784c38b4fd62b9fd/b> 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 75 - Lib 0 (/fs04/rp24/fra/analyses/coral_chemosynthesis/tandon2023/p_lutea/output/qc/filtered/SRR208> 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 75 - Lib 1 (/fs04/rp24/fra/analyses/coral_chemosynthesis/tandon2023/p_lutea/output/qc/filtered/SRR208> 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 75 - Lib 2 (/fs04/rp24/fra/analyses/coral_chemosynthesis/tandon2023/p_lutea/output/qc/filtered/SRR208> 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 75 - Lib 3 (/fs04/rp24/fra/analyses/coral_chemosynthesis/tandon2023/p_lutea/output/qc/filtered/SRR208> 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 75 - Lib 4 (/fs04/rp24/fra/analyses/coral_chemosynthesis/tandon2023/p_lutea/output/qc/filtered/SRR208> 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 75 - Lib 5 (/fs04/rp24/fra/analyses/coral_chemosynthesis/tandon2023/plutea/output/qc/filtered/SRR208> 2024-07-16 07:51:36 - b'INFO utils/utils.h : 152 - Real: 0.0160\tuser: 0.0030\tsys: 0.0030\tmaxrss: 13076' 2024-07-16 07:51:36 - k-max reset to: 127 2024-07-16 07:51:36 - Start assembly. Number of CPU threads 128 2024-07-16 07:51:36 - k list: 27,37,47,57,67,77,87,97,107,117,127 2024-07-16 07:51:36 - Memory used: 1947211993497 2024-07-16 07:51:36 - Extracting solid (k+1)-mers and building sdbg for k = 27 2024-07-16 07:51:36 - command /fs04/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/f98ddf6c1cb52f3d784c38b4fd62b9fd/b> 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 115 - Before reading, sizeof seq_package: 372' 2024-07-16 07:51:36 - b'INFO sequence/io/sequence_lib.cpp : 117 - After reading, sizeof seq_package: 372' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s1.cpp : 103 - 8 reads, 150 max read length, 1200 total bases' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s1.cpp : 109 - 2 words per substring' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s1.cpp : 123 - Number of files for mercy candidate reads: 1' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 148 - Preparing data...' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s2.cpp : 102 - 2 words per substring, words per dummy node ($v): 2' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 153 - Preparing data... Done. Time elapsed: 0.0358' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 156 - Preparing partitions and calculating bucket sizes...' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 80 - Minimum memory required: 496 bytes' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 138 - Lv1 items: 266, Lv2 items: 384' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 139 - Memory of derived class: 412, Memory for Lv1+Lv2: 4136' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 165 - Preparing partitions and calculating bucket sizes... Done. Time elapsed: 0.0396' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 172 - Start main loop...' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 185 - Lv1 scanning from bucket 0 to 517' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 192 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0016' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 198 - Lv1 fetching & sorting done. Time elapsed: 0.0018' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 185 - Lv1 scanning from bucket 517 to 1295' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 192 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0004' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 198 - Lv1 fetching & sorting done. Time elapsed: 0.0009' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 185 - Lv1 scanning from bucket 1295 to 65536' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 192 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0135' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 198 - Lv1 fetching & sorting done. Time elapsed: 0.0026' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 204 - Main loop done. Time elapsed: 0.0215' 2024-07-16 07:51:36 - b'INFO sorting/base_engine.cpp : 207 - Postprocessing...' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s2.cpp : 618 - Number of $ A C G T A- C- G- T-:' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s2.cpp : 619 - 16 498 507 491 488 0 0 0 0' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s2.cpp : 624 - Total number of edges: 2000' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s2.cpp : 625 - Total number of ONEs: 1984' 2024-07-16 07:51:36 - b'INFO sorting/read_to_sdbg_s2.cpp : 626 - Total number of $v edges: 16' 2024-07-16 07:51:36 - b'INFO sorting/baseengine.cpp : 210 - Postprocess done. Time elapsed: 0.0781' 2024-07-16 07:51:36 - b'INFO utils/utils.h : 152 - Real: 0.1886\tuser: 3.8188\tsys: 1.2315\tmaxrss: 495820' 2024-07-16 07:51:36 - Assemble contigs from SdBG for k = 27 2024-07-16 07:51:36 - command /fs04/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/f98ddf6c1cb52f3d784c38b4fd62b9fd/b> 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 129 - Loading succinct de Bruijn graph: /fs04/rp24/fra/analyses/coral_chemosynthesis/tandon2023/p_lute> 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 133 - Number of Edges: 2000; K value: 27' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 140 - Number of CPU threads: 128' 2024-07-16 07:51:36 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 2; Accumulated tips removed: 0; time elapsed: 0.0011' 2024-07-16 07:51:36 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 4; Accumulated tips removed: 0; time elapsed: 0.0001' 2024-07-16 07:51:36 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 8; Accumulated tips removed: 0; time elapsed: 0.0001' 2024-07-16 07:51:36 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 16; Accumulated tips removed: 0; time elapsed: 0.0001' 2024-07-16 07:51:36 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 32; Accumulated tips removed: 0; time elapsed: 0.0001' 2024-07-16 07:51:36 - b'INFO assembly/sdbg_pruning.cpp : 169 - Removing tips with length less than 54; Accumulated tips removed: 0; time elapsed: 0.0001' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 158 - Tips removal done! Time elapsed(sec): 0.008' 2024-07-16 07:51:36 - b'INFO assembly/unitig_graph.cpp : 83 - Graph size without loops: 8, palindrome: 0' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 166 - unitig graph size: 8, time for building: 0.003' 2024-07-16 07:51:36 - b'INFO assembly/contig_stat.h : 40 - Max: 150, Min: 150, N50: 150, number contigs: 8, number isolated: 8, number looped: 0, total siz> 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 184 - Graph cleaning round 1' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 200 - Number of bubbles removed: 0, Time elapsed(sec): 0.002' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 210 - Number of complex bubbles removed: 0, Time elapsed(sec): 0.000210' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 221 - Number unitigs disconnected: 0, time: 0.000' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 245 - Unitigs removed in excessive pruning: 0, time: 0.000' 2024-07-16 07:51:36 - b'INFO assembly/contig_stat.h : 40 - Max: 150, Min: 150, N50: 150, number contigs: 8, number isolated: 8, number looped: 0, total siz> 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 267 - Time to output: 0.001873' 2024-07-16 07:51:36 - b'INFO main_assemble.cpp : 286 - Number of local low depth unitigs removed: 0, complex bubbles removed: 0, time: 0.000456' 2024-07-16 07:51:36 - b'INFO assembly/contigstat.h : 40 - Max: 150, Min: 150, N50: 150, number contigs: 8, number isolated: 8, number looped: 0, total siz> 2024-07-16 07:51:36 - b'INFO utils/utils.h : 152 - Real: 0.1420\tuser: 2.0403\tsys: 0.5743\tmaxrss: 21784' 2024-07-16 07:51:37 - Local assembly for k = 27 2024-07-16 07:51:37 - command /fs04/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/f98ddf6c1cb52f3d784c38b4fd62b9fd/b> 2024-07-16 07:51:37 - b'INFO localasm/hash_mapper.cpp : 99 - Number of contigs: 0, index size: 0' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 315 - Hash mapper construction time elapsed: 0.004956' 2024-07-16 07:51:37 - b'INFO sequence/io/sequence_lib.cpp : 115 - Before reading, sizeof seq_package: 372' 2024-07-16 07:51:37 - b'INFO sequence/io/sequence_lib.cpp : 117 - After reading, sizeof seq_package: 372' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 324 - Read lib time elapsed: 0.000929' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 133 - Lib 0, insert size: 0.00 sd: -nan' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 133 - Lib 1, insert size: 0.00 sd: -nan' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 133 - Lib 2, insert size: 0.00 sd: -nan' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 133 - Lib 3, insert size: 0.00 sd: -nan' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 133 - Lib 4, insert size: 0.00 sd: -nan' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 133 - Lib 5, insert size: 0.00 sd: -nan' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 330 - Insert size estimation time elapsed: 0.001655' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 218 - Lib 0: total 0 reads, aligned 0, added 0 reads to local assembly' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 218 - Lib 1: total 0 reads, aligned 0, added 0 reads to local assembly' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 218 - Lib 2: total 2 reads, aligned 0, added 0 reads to local assembly' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 218 - Lib 3: total 2 reads, aligned 0, added 0 reads to local assembly' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 218 - Lib 4: total 2 reads, aligned 0, added 0 reads to local assembly' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 218 - Lib 5: total 2 reads, aligned 0, added 0 reads to local assembly' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 338 - Mapping time elapsed: 0.002625' 2024-07-16 07:51:37 - b'INFO localasm/local_assemble.cpp : 232 - Minimum number of reads to do local assembly: 0' 2024-07-16 07:51:37 - b'INFO localasm/localassemble.cpp : 346 - Local assembly time elapsed: 0.005406' 2024-07-16 07:51:37 - b'INFO utils/utils.h : 152 - Real: 0.0159\tuser: 1.1029\tsys: 0.2242\tmaxrss: 13120' 2024-07-16 07:51:37 - Extract iterative edges from k = 27 to 37 2024-07-16 07:51:37 - command /fs04/rp24/fra/software/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/f98ddf6c1cb52f3d784c38b4fd62b9fd/b> 2024-07-16 07:51:37 - b'INFO main_iterate.cpp : 190 - Selected kmer type size for k: 8' 2024-07-16 07:51:37 - b'INFO main_iterate.cpp : 123 - Selected kmer type size for next k: 12' 2024-07-16 07:51:37 - b'INFO main_iterate.cpp : 138 - Processed: 8, aligned: 0. Iterative edges: 0' 2024-07-16 07:51:37 - b'INFO main_iterate.cpp : 142 - Total: 8, aligned: 0. Iterative edges: 0' 2024-07-16 07:51:37 - b'INFO utils/utils.h : 152 - Real: 0.0124\tuser: 0.7277\tsys: 0.2075\tmaxrss: 13120' 2024-07-16 07:51:37 - Merging to output final contigs 2024-07-16 07:51:37 - 0 contigs, total 0 bp, min 0 bp, max 0 bp, avg 0 bp, N50 0 bp 2024-07-16 07:51:37 - ALL DONE. Time elapsed: 0.615160 seconds

francesco-ricci commented 3 months ago

I am running this analysis on a coral dataset using host step removal. This is the only reason why I could think the low number of aligned reads.

vinisalazar commented 3 months ago

@francesco-ricci that could very well be the cause. What are the sizes of the input files going into the assembler?