jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
378 stars 80 forks source link

Error: Stopping in STEP10 -> 10.mapsamples.pl. File Sample1/intermediate/10.Sample1.mapcount is empty! #868

Closed eoinoh91 closed 2 months ago

eoinoh91 commented 3 months ago

Hi - running without assembly as I did this externally. Running into an error at step 10. Syslog, script, and file are attached. Any idea what is causing this error? Thanks!

Syslog: syslog.zip

This is my script:

#!/bin/bash

#SBATCH --job-name=SqueezeMeta_test_contigs
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
#SBATCH --mem=200G
#SBATCH --output=squeezemeta_test-%j.out
#SBATCH --error=squeezemeta_test-%j.err

# Load Conda environment
source activate SqueezeMeta

# Set environmental parameters
project_dir="MTG"
dir_data="${project_dir}/data"
dir_docs="${project_dir}/docs"
dir_results="${project_dir}/results"
dir_contigs="${dir_results}/megahit_assemblies/final_assemblies"
dir_test="${dir_results}/squeezemeta_test"
dir_clean_reads="${dir_data}/unmapped_bovine"

# Ensure directories exist
mkdir -p ${dir_test}/raw

# Sample file path
sample_file="${dir_test}/samplefile.txt"

# Contig file path
contig_file="${dir_test}/contigs/RMIC-3a.contigs.fa"

# Run SqueezeMeta
SqueezeMeta.pl -m sequential -s $sample_file -f $dir_clean_reads --extassembly $contig_file --nobins -t 128 

And this is the stdout file:

SqueezeMeta v1.6.4, July 2024 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Tue Jul 16 16:28:47 2024 in sequential mode 67 metagenomes found: Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9 Sample10 Sample11 Sample12 Sample13 Sample14 Sample15 Sample16 Sample17 Sample18 Sample19 Sample20 Sample21 Sample22 Sample23 Sample24 Sample25 Sample26 Sample27 Sample28 Sample29 Sample30 Sample31 Sample32 Sample33 Sample34 Sample35 Sample36 Sample37 Sample38 Sample39 Sample40 Sample41 Sample42 Sample43 Sample44 Sample45 Sample46 Sample47 Sample48 Sample49 Sample50 Sample51 Sample52 Sample53 Sample54 Sample55 Sample56 Sample57 Sample58 Sample59 Sample60 Sample61 Sample62 Sample63 Sample64 Sample65 Sample66 Sample67

--- SAMPLE Sample1 --- Now creating directories Reading configuration from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/SqueezeMeta_conf.pl Running trimmomatic (Bolger et al 2014, Bioinformatics 30(15):2114-20) for quality filtering Parameters: [0 seconds]: STEP1 -> RUNNING ASSEMBLY: 01.run_all_assemblies.pl (megahit) External assembly provided: /home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/contigs/RMIC-3a.contigs.fa. Overriding assembly Renaming contigs in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/01.Sample1.fasta Running prinseq for contig statistics: /data/home/AGR.GC.CA/oharaeo/miniconda3/envs/functional_env/envs/SqueezeMeta/SqueezeMeta/bin/prinseq-lite.pl -fasta /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/01.Sample1.fasta -stats_len -stats_info -stats_assembly > /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/01.Sample1.stats Counting length of contigs Contigs stored in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/01.Sample1.fasta Number of contigs: 409547 [5 seconds]: STEP2 -> RNA PREDICTION: 02.rnas.pl Running barrnap (Seeman 2014, Bioinformatics 30, 2068-9) for predicting RNAs: Bacteria Archaea Eukaryote Mitochondrial Running RDP classifier (Wang et al 2007, Appl Environ Microbiol 73, 5261-7) Running Aragorn (Laslett & Canback 2004, Nucleic Acids Res 31, 11-16) for tRNA/tmRNA prediction [2 minutes, 23 seconds]: STEP3 -> ORF PREDICTION: 03.run_prodigal.pl Running prodigal (Hyatt et al 2010, BMC Bioinformatics 11: 119) for predicting ORFs ORFs predicted: 607997 [20 minutes, 5 seconds]: STEP4 -> HOMOLOGY SEARCHES: 04.rundiamond.pl Setting block size for Diamond AVAILABLE (free) RAM memory: 2986.41 Gb We will set Diamond block size to 16 (Gb RAM/8, Max 16). You can override this setting using the -b option when starting the project, or changing the $blocksize variable in SqueezeMeta_conf.pl Working with taxonomy database in /data/home/AGR.GC.CA/oharaeo/databases/SqueezeMeta/db/nr.dmnd taxa COGS Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for KEGG

[1 hours, 55 minutes, 3 seconds]: STEP5 -> HMMER/PFAM: 05.run_hmmer.pl Running HMMER3 (Eddy 2009, Genome Inform 23, 205-11) for Pfam [9 hours, 3 minutes, 34 seconds]: STEP6 -> TAXONOMIC ASSIGNMENT: 06.lca.pl Splitting Diamond file Starting multithread LCA in 124 threads Creating /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.wranks file Creating /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.noidfilter.wranks file [9 hours, 10 minutes, 40 seconds]: STEP7 -> FUNCTIONAL ASSIGNMENT: 07.fun3assign.pl Functional assignment for COGS KEGG PFAM [9 hours, 11 minutes, 12 seconds]: STEP9 -> CONTIG TAX ASSIGNMENT: 09.summarycontigs3.pl Reading /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.wranks Writing output to /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/09.Sample1.contiglog Reading /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.noidfilter.wranks Writing output to /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/09.Sample1.contiglog.noidfilter [9 hours, 12 minutes, 29 seconds]: STEP10 -> MAPPING READS: 10.mapsamples.pl Reading samples from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/data/00.Sample1.samples Metagenomes found: 1 Reading contig length from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/01.Sample1.lon Reading orf info from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/03.Sample1.gff Mapping with Bowtie2 (Langmead and Salzberg 2012, Nat Methods 9(4), 357-9) Creating reference from contigs Working with sample 1: Sample1 Getting raw reads Aligning to reference with bowtie Calculating contig coverage Counting with sqm_counter: Opening 124 threads 25228 reads counted 26259 reads counted 28036 reads counted 27126 reads counted 36335 reads counted 29219 reads counted 34291 reads counted 26223 reads counted 25687 reads counted 37082 reads counted 21675 reads counted 23531 reads counted 27702 reads counted 24775 reads counted 27307 reads counted 22681 reads counted 28441 reads counted 26875 reads counted 29350 reads counted 36139 reads counted 32997 reads counted 22705 reads counted 26072 reads counted 28050 reads counted 23498 reads counted 21423 reads counted 27253 reads counted 23065 reads counted 25459 reads counted 25878 reads counted 31673 reads counted 22222 reads counted 25560 reads counted 28341 reads counted 24313 reads counted 31043 reads counted 24193 reads counted 32221 reads counted 24018 reads counted 26001 reads counted 31307 reads counted 23334 reads counted 25134 reads counted 23601 reads counted 26191 reads counted 21305 reads counted 27817 reads counted 24601 reads counted 23746 reads counted 24871 reads counted 31767 reads counted 25782 reads counted 27296 reads counted 27081 reads counted 26253 reads counted 28504 reads counted 20810 reads counted 28770 reads counted 28387 reads counted 25936 reads counted 30211 reads counted 25427 reads counted 43584 reads counted 28567 reads counted 24499 reads counted 29799 reads counted 26652 reads counted 25957 reads counted 23996 reads counted 30291 reads counted 23566 reads counted 23915 reads counted 27949 reads counted 25814 reads counted 28938 reads counted 27531 reads counted 28020 reads counted 22849 reads counted 33552 reads counted 22303 reads counted 27374 reads counted 36699 reads counted 30604 reads counted 28441 reads counted 35902 reads counted 33837 reads counted 24466 reads counted 24746 reads counted 27012 reads counted 28201 reads counted 27393 reads counted 34170 reads counted 28336 reads counted 24031 reads counted 23124 reads counted 25152 reads counted 23676 reads counted 23121 reads counted 30124 reads counted 28373 reads counted 25016 reads counted 25777 reads counted 27217 reads counted 25022 reads counted 25788 reads counted 22217 reads counted 30334 reads counted 29822 reads counted 23516 reads counted 24071 reads counted 30637 reads counted 28960 reads counted 27310 reads counted 29613 reads counted 24669 reads counted 38495 reads counted 31187 reads counted 24151 reads counted 31372 reads counted 27601 reads counted 24331 reads counted 42881 reads counted 25484 reads counted 23418 reads counted Output in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/10.Sample1.mapcount Stopping in STEP10 -> 10.mapsamples.pl. File /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/10.Sample1.mapcount is empty!

If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/syslog file (zip it first)

eoinoh91 commented 3 months ago

I am running into the same issue, despite an attempted fix. The error is indicating the 'mapcount' file is empty, but it is not, it seems to be correctly formatted:

image

Any idea what is causing this? Note slightly changed script, test sample name changed from Sample1 to "RMIC-3A"

fpusan commented 3 months ago

This was my fault, but should be fixed in 1.6.4. If you don't want to update just restart from step 11, the results should actually be fine already regardless of SqueezeMeta complaining.

eoinoh91 commented 3 months ago

Thanks! I added the --restart -step 11 flag to the command, but I'm encountering a different error:

Can't open samples file (-s) in /data/00..samples. Please check that it is the correct file. There is a file there, in that directory. Is this a related issue? I've had this error before when trying to restart jobs.

fpusan commented 3 months ago

You need to restart the individual sample that failed (so add -p Sample1 to the call) Since this would be cumbersome, you can also create a for loop in which for each samples:

  1. You create a subset of your samples file, containing only the fastq files for that samples
  2. You run SqueezeMeta for that sample using the cosssembly mode SqueezeMeta.pl -m cosssembly -s subsetted_samples_file.tsv -p sample_name -f /path/to/raw/fastqs

This will give a similar result than the sequential mode, but will avoid the bug in step 10. If working in a cluster, you could also launch each sample as a different job, which will give you faster results.

I noticed that you seem to be already using v1.6.4, can you confirm this? The bug is supposed to be fixed in there, otherwise I'll look into it when I get back from holidays

eoinoh91 commented 3 months ago

Thanks - adding the sample name worked. This is just a test run on a single sample so it wasn't a problem. I'll look into different jobs for each sample on the full run.

And yes, confirmed that this is v1.6.4.

attiszabo commented 2 months ago

Hi! I have the same issue with coassembly mode using v1.6.4. Leaving a reply here if there is a follow-up

fpusan commented 2 months ago

This should be now fixed in 1.6.5, let me know if you still experience problems

eoinoh91 commented 2 months ago

I'm still encountering the same error using version 1.6.5, this is the stdout file:

SqueezeMeta v1.6.5, August 2024 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sánchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Tue Aug 13 09:52:43 2024 in sequential mode 1 metagenomes found: RMIC-3A

--- SAMPLE RMIC-3A --- Contig tax file /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/intermediate/09.RMIC-3A.contiglog already found, skipping step 9 [0 seconds]: STEP10 -> MAPPING READS: 10.mapsamples.pl Reading samples from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/data/00.RMIC-3A.samples Metagenomes found: 1 Reading contig length from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/intermediate/01.RMIC-3A.lon Reading orf info from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/results/03.RMIC-3A.gff Mapping with Bowtie2 (Langmead and Salzberg 2012, Nat Methods 9(4), 357-9) Creating reference from contigs Working with sample 1: RMIC-3A Getting raw reads Aligning to reference with bowtie BAM file already found in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/data/bam/RMIC-3A.RMIC-3A.bam, skipping Calculating contig coverage Counting with sqm_counter: Opening 124 threads 124299 reads counted 113444 reads counted 115428 reads counted 129580 reads counted 124751 reads counted 116770 reads counted 146233 reads counted 104168 reads counted 118634 reads counted 132573 reads counted 102945 reads counted 137322 reads counted 138487 reads counted 121774 reads counted 101183 reads counted 118859 reads counted 111903 reads counted 129460 reads counted 110807 reads counted 115813 reads counted 122988 reads counted 101488 reads counted 134693 reads counted 102212 reads counted 107920 reads counted 121081 reads counted 101667 reads counted 301689 reads counted 136553 reads counted 103343 reads counted 118448 reads counted 112259 reads counted 107742 reads counted 126454 reads counted 106349 reads counted 143754 reads counted 135517 reads counted 113362 reads counted 130643 reads counted 107889 reads counted 133304 reads counted 103032 reads counted 114337 reads counted 125945 reads counted 99468 reads counted 112034 reads counted 129694 reads counted 122959 reads counted 138583 reads counted 138960 reads counted 118876 reads counted 140740 reads counted 117713 reads counted 118488 reads counted 112253 reads counted 127211 reads counted 119312 reads counted 117287 reads counted 117435 reads counted 139843 reads counted 136703 reads counted 128153 reads counted 111639 reads counted 118074 reads counted 117192 reads counted 113633 reads counted 114666 reads counted 134177 reads counted 105133 reads counted 115982 reads counted 154114 reads counted 130382 reads counted 135572 reads counted 120261 reads counted 114874 reads counted 106310 reads counted 139945 reads counted 128657 reads counted 125180 reads counted 128275 reads counted 100298 reads counted 129319 reads counted 107963 reads counted 153402 reads counted 119957 reads counted 102118 reads counted 112499 reads counted 130106 reads counted 122172 reads counted 122409 reads counted 142998 reads counted 169731 reads counted 139736 reads counted 116870 reads counted 115515 reads counted 124093 reads counted 127733 reads counted 138398 reads counted 127281 reads counted 111055 reads counted 157144 reads counted 172411 reads counted 108871 reads counted 124514 reads counted 103204 reads counted 174357 reads counted 137474 reads counted 107233 reads counted 132074 reads counted 117031 reads counted 109012 reads counted 122804 reads counted 136922 reads counted 121473 reads counted 122620 reads counted 113755 reads counted 112720 reads counted 112854 reads counted 138251 reads counted 119585 reads counted 124009 reads counted 122411 reads counted 109912 reads counted 119932 reads counted Output in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/intermediate/10.RMIC-3A.mapcount Stopping in STEP10 -> 10.mapsamples.pl. File /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/results/10.RMIC-3A.mappingstat is empty!

If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/syslog file (zip it first)

fpusan commented 2 months ago

Can you share /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/results/10.RMIC-3A.mappingstat with me here?

eoinoh91 commented 2 months ago

Thanks for the quick reply! This is the mappingstat file:

Sample Total reads Mapped reads Mapping perc Total bases RMIC-3A 31785036 14746863 46.40 4693972972

fpusan commented 2 months ago

There should be a # symbol before the word "Sample", is it there?

fpusan commented 2 months ago

Please upload the actual file

eoinoh91 commented 2 months ago

Thanks for all the responses. So, I was running to some issues following update, and decided to do a complete clean re-install of SqueezeMeta, v.1.6.5. I've re-run the same script as above to test, and the stdout file is providing the below error:

stdout file:

SqueezeMeta v1.6.5, August 2024 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sánchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Wed Aug 14 09:34:21 2024 in sequential mode
1 metagenomes found: RMIC-3A

--- SAMPLE RMIC-3A ---
Can't open conf file /data/home/AGR.GC.CA/oharaeo/miniconda3/envs/functional_env/envs/SqueezeMeta/SqueezeMeta/scripts/SqueezeMeta_conf.pl

Any idea on this - should I open it as a fresh issue?

For clarity, this is the relevant part of the command:

dir_data="${project_dir}/data"
dir_docs="${project_dir}/docs"
dir_results="${project_dir}/results"
dir_contigs="${dir_results}/megahit_assemblies/final_assemblies"
dir_test="${dir_results}/squeezemeta_test"
dir_clean_reads="${dir_data}/unmapped_bovine"

# Sample file path
sample_file="${dir_test}/samplefile.txt"

# Contig file path
contig_file="${dir_test}/contigs/RMIC-3a.contigs.fa"

# Run SqueezeMeta
SqueezeMeta.pl -m sequential -s $sample_file -f $dir_clean_reads --extassembly $contig_file --nobins -t 124
fpusan commented 2 months ago

Yes, you need to use configure_nodb.pl to link the new install to the database. See the ReadMe for details

El mié, 14 ago 2024, 17:39, eoinoh91 @.***> escribió:

Thanks for all the responses. So, I was running to some issues following update, and decided to do a complete clean re-install of SqueezeMeta, v.1.6.5. I've re-run the same script as above to test, and the stdout file is providing the below error:

stdout file:

SqueezeMeta v1.6.5, August 2024 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sánchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Wed Aug 14 09:34:21 2024 in sequential mode 1 metagenomes found: RMIC-3A

--- SAMPLE RMIC-3A --- Can't open conf file /data/home/AGR.GC.CA/oharaeo/miniconda3/envs/functional_env/envs/SqueezeMeta/SqueezeMeta/scripts/SqueezeMeta_conf.pl

Any idea on this - should I open it as a fresh issue?

— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/868#issuecomment-2289139818, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAHO7VGSSJSJLJEK3ZLT3DZRN24ZAVCNFSM6AAAAABLA3T3AOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGEZTSOBRHA . You are receiving this because you modified the open/close state.Message ID: @.***>

eoinoh91 commented 2 months ago

Yes - apologies! I figured that out, and the test run has completed successfully! Thank you for your patience and help!