jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
379 stars 80 forks source link

10.mapsamples.pl. Program finished abnormally #371

Closed linfanxiao closed 3 years ago

linfanxiao commented 3 years ago

Dear developers,

I have encountered a problem like this

6094213 reads counted
  6254587 reads counted
  6414961 reads counted
  6575335 reads counted
  6735709 reads counted
  6896083 reads counted
  7056457 reads counted
  7216831 reads counted
  7377205 reads counted
  7537579 reads counted
  7697953 reads counted
  7858327 reads counted
  8018701 reads counted
  8179075 reads counted
  8339449 reads counted
  8499823 reads counted
  8660197 reads counted
  8820571 reads counted
  8980945 reads counted
  9141319 reads counted
  9301693 reads counted
  9462067 reads counted
  9622441 reads counted
  9782815 reads counted
  9943189 reads counted
Stopping in STEP10 -> 10.mapsamples.pl. Program finished abnormally

  If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues
  Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /home/ad/data1/data/reference/human/clean/human-oral/syslog file (zip it first)
Died at /home/ad/miniconda3/envs/SqueezeMeta/bin/SqueezeMeta.pl line 1367.

and I checked the syslog

[0 seconds]: STEP10 -> 10.mapsamples.pl
Getting raw reads for human1: cp /home/ad/data1/data/reference/human/clean/human-oral/data/raw_fastq/SRR10744092_paired_1.fastq /home/ad/data1/data/reference/human/clean/human-oral/temp/human-oral.human1.current_1; cp /home/ad/data1/data/reference/human/clean/human-oral/data/raw_fastq/SRR10744092_paired_2.fastq /home/ad/data1/data/reference/human/clean/human-oral/temp/human-oral.human1.current_2; 
Aligning with bowtie: cp /home/ad/data1/data/reference/human/clean/human-oral/data/raw_fastq/SRR10744092_paired_1.fastq /home/ad/data1/data/reference/human/clean/human-oral/temp/human-oral.human1.current_1; cp /home/ad/data1/data/reference/human/clean/human-oral/data/raw_fastq/SRR10744092_paired_2.fastq /home/ad/data1/data/reference/human/clean/human-oral/temp/human-oral.human1.current_2; 
Calling sqm_counter: Sample human1, SAM /home/ad/data1/data/reference/human/clean/human-oral/data/sam/human-oral.human1.sam, Number of reads 11226246, GFF /home/ad/data1/data/reference/human/clean/human-oral/results/03.human-oral.gff
Stopping in STEP10 -> 10.mapsamples.pl. Program finished abnormally

I don't know what causes this problem the format of sam like this

@HD VN:1.0  SO:unsorted
@SQ SN:megahit_1    LN:214
@SQ SN:megahit_2    LN:201
@SQ SN:megahit_3    LN:205
@SQ SN:megahit_4    LN:244
@SQ SN:megahit_5    LN:217
@SQ SN:megahit_6    LN:321
@SQ SN:megahit_7    LN:270
@SQ SN:megahit_8    LN:283
@SQ SN:megahit_9    LN:232

and the gff

##gff-version  3
# Sequence Data: seqnum=1;seqlen=214;seqhdr="megahit_1"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="8|Bacteroides_fragilis_NCTC_9343|B|43.2|11|0";gc_cont=43.20;transl_table=11;uses_sd=0
megahit_1   Prodigal_v2.6.3 CDS 80  214 6.9 +   0   ID=megahit_1_80-214;partial=01;start_type=ATG;rbs_motif=TAA;rbs_spacer=10bp;gc_cont=0.437;conf=83.03;score=6.91;cscore=4.92;sscore=1.99;rscore=-0.20;uscore=-1.93;tscore=4.12;
# Sequence Data: seqnum=2;seqlen=201;seqhdr="megahit_2"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="36|Ralstonia_solanacearum_PSI07|B|66.1|11|1";gc_cont=66.10;transl_table=11;uses_sd=1
megahit_2   Prodigal_v2.6.3 CDS 2   199 45.9    +   0   ID=megahit_2_2-199;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.662;conf=100.00;score=45.94;cscore=44.33;sscore=1.61;rscore=0.00;uscore=0.00;tscore=1.61;
# Sequence Data: seqnum=3;seqlen=205;seqhdr="megahit_3"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="48|Xylella_fastidiosa_Temecula1|B|51.8|11|1";gc_cont=51.80;transl_table=11;uses_sd=1
megahit_3   Prodigal_v2.6.3 CDS 1   204 23.6    -   0   ID=megahit_3_1-204;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.475;conf=99.56;score=23.60;cscore=21.99;sscore=1.61;rscore=0.00;uscore=0.00;tscore=1.61;
# Sequence Data: seqnum=4;seqlen=244;seqhdr="megahit_4"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="6|Anaplasma_phagocytophilum_HZ|B|41.6|11|1";gc_cont=41.60;transl_table=11;uses_sd=1
megahit_4   Prodigal_v2.6.3 CDS 2   244 33.9    -   0   ID=megahit_4_2-244;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.412;conf=99.96;score=33.91;cscore=32.30;sscore=1.61;rscore=0.00;uscore=0.00;tscore=1.61;
# Sequence Data: seqnum=5;seqlen=217;seqhdr="megahit_5"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="8|Bacteroides_fragilis_NCTC_9343|B|43.2|11|0";gc_cont=43.20;transl_table=11;uses_sd=0
megahit_5   Prodigal_v2.6.3 CDS 2   217 15.2    -   0   ID=megahit_5_2-217;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.431;conf=97.02;score=15.16;cscore=13.55;sscore=1.61;rscore=0.00;uscore=0.00;tscore=1.61;
# Sequence Data: seqnum=6;seqlen=321;seqhdr="megahit_6"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="0|Mycoplasma_bovis_PG45|B|29.3|4|1";gc_cont=29.30;transl_table=4;uses_sd=1
megahit_6   Prodigal_v2.6.3 CDS 1   114 10.8    +   0   ID=megahit_6_1-114;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.158;conf=92.28;score=10.79;cscore=7.57;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;
# Sequence Data: seqnum=7;seqlen=270;seqhdr="megahit_7"

conf.pl

$mode = "coassembly";

$installpath = "/home/ad/miniconda3/envs/SqueezeMeta/SqueezeMeta";
#-- Project dir (calculated dinamically on execution, DO NOT MODIFY)

use File::Basename;
use Cwd 'abs_path';
$projectdir   = abs_path(dirname(__FILE__));

#-- Generic paths

#$databasepath = "/media/disk7/fer/SqueezeMeta/db";
$databasepath = "/home/ad/data1/data/squeezemeta/database/db";
$extdatapath  = "$installpath/data";
$scriptdir    = "$installpath/scripts";   #-- Scripts directory

#-- Paths relative to the project

$projectname = "human-oral";
$datapath    = "$projectdir/data";                                       #-- Directory containing all datafiles
$resultpath  = "$projectdir/results";                                    #-- Directory for storing results
$extpath     = "$projectdir/ext_tables";                                 #-- Directory for storing tables for further analysis
$tempdir     = "$projectdir/temp";                                       #-- Temp directory
$interdir    = "$projectdir/intermediate";                               #-- Temp directory
%bindirs = ("metabat2","$resultpath/metabat2","maxbin","$resultpath/maxbin");
%dasdir      = ("DASTool","$resultpath/DAS/$projectname\_DASTool\_bins");      #-- Directory for DASTool results

#-- Result files

$mappingfile     = "$datapath/00.$projectname.samples";         #-- Mapping file (samples -> fastq)
$methodsfile     = "$projectdir/methods.txt";               #-- File listing the  methods used and their citation info
$syslogfile      = "$projectdir/syslog";                        #-- Logging file
$contigsfna      = "$resultpath/01.$projectname.fasta";         #-- Contig file from assembly
$contigslen      = "$interdir/01.$projectname.lon";             #-- Length of each contig
$rnafile         = "$resultpath/02.$projectname.rnas";          #-- RNAs from barrnap
$trnafile        = "$resultpath/02.$projectname.trnas";         #-- tRNAs from aragorn
$gff_file        = "$resultpath/03.$projectname.gff";           #-- gff file from prodigal
$aafile          = "$resultpath/03.$projectname.faa";           #-- Aminoacid sequences for genes
$ntfile          = "$resultpath/03.$projectname.fna";           #-- Nucleotide sequences for genes
$taxdiamond      = "$interdir/04.$projectname.nr.diamond";      #-- Diamond result
$cogdiamond      = "$interdir/04.$projectname.eggnog.diamond";  #-- Diamond result, COGs
$keggdiamond     = "$interdir/04.$projectname.kegg.diamond";    #-- Diamond result, KEGG
$pfamhmmer       = "$interdir/05.$projectname.pfam.hmm";        #-- Hmmer result for Pfam
$fun3tax         = "$resultpath/06.$projectname.fun3.tax";      #-- Fun3 annotations, KEGG
$fun3kegg        = "$resultpath/07.$projectname.fun3.kegg";     #-- Fun3 annotations, KEGG
$fun3cog         = "$resultpath/07.$projectname.fun3.cog";      #-- Fun3 annotation, COGs
$fun3pfam        = "$resultpath/07.$projectname.fun3.pfam";     #-- Fun3 annotation, Pfams
$gff_file_blastx = "$resultpath/08.$projectname.gff";           #-- gff file from prodigal & blastx
$fun3tax_blastx  = "$resultpath/08.$projectname.fun3.tax";      #-- Fun3 annotations prodigal & blastx, KEGG
$fun3kegg_blastx = "$resultpath/08.$projectname.fun3.kegg";     #-- Fun3 annotations prodigal & blastx, KEGG
$fun3cog_blastx  = "$resultpath/08.$projectname.fun3.cog";      #-- Fun3 annotation prodigal & blastx, COGs
$fna_blastx      = "$interdir/08.$projectname.blastx.fna";      #-- Secuencias nt obtenidas por blastx
$allorfs         = "$tempdir/09.$projectname.allorfs";          #-- From summary_contigs.pl, allorfs file
$alllog          = "$interdir/09.$projectname.contiglog";       #-- From summary_contigs.pl, contiglog file (formerly alllog file)
$mapcountfile    = "$interdir/10.$projectname.mapcount";        #-- From mapsamples.pl, rpkm and coverage counts for all samples
$contigcov       = "$interdir/10.$projectname.contigcov";       #-- From mapbamsamples.pl, coverages of  for all samples
$mappingstat     = "$resultpath/10.$projectname.mappingstat";   #-- From mapsamples.pl, mapping statistics for all samples
$mcountfile      = "$resultpath/11.$projectname.mcount";        #-- From mcount.pl, abundances of all taxa
$mergedfile      = "$resultpath/13.$projectname.orftable";      #-- Gene table file
$bintax          = "$interdir/17.$projectname.bintax";          #-- From addtax2.pl
$bincov          = "$interdir/19.$projectname.bincov";          #-- Coverage of bins, from getbins.pl
$bintable        = "$resultpath/19.$projectname.bintable";      #-- Mapping of contigs in bins, from getbins.pl
$contigsinbins   = "$interdir/19.$projectname.contigsinbins";   #-- Bin to which each contig belongs
$contigtable     = "$resultpath/20.$projectname.contigtable";   #-- From getcontigs.pl, contigs table

#-- Datafiles

$coglist   = "$extdatapath/coglist.txt";        #-- COG equivalence file (COGid -> Function -> Functional class)
$kegglist  = "$extdatapath/keggfun2.txt";       #-- KEGG equivalence file (KEGGid -> Function -> Functional class)
$pfamlist  = "$extdatapath/pfam.dat";           #-- PFAM equivalence file
$taxlist   = "$extdatapath/alltaxlist.txt";     #-- Tax equivalence file 
$nr_db     = "$databasepath/nr.dmnd";
$cog_db    = "$databasepath/eggnog";
$kegg_db   = "$databasepath/keggdb";
$lca_db    = "$databasepath/LCA_tax/taxid.db";
$bowtieref = "$datapath/$projectname.bowtie";   #-- Contigs formatted for Bowtie
$pfam_db   = "$databasepath/Pfam-A.hmm";
$mothur_r  = "$databasepath/silva.nr_v132.align";
$mothur_t  = "$databasepath/silva.nr_v132.tax";

#-- Variables

$blocksize       = NF;
$nocog           = 0;
$nokegg          = 0;
$nopfam          = 0;
$euknofilter     = 0;
$nobins          = 0;
$doublepass      = 0;
$singletons      = 0;
$cleaning        = 0;
$cleaningoptions = "";
$mapper          = "bowtie";
$mapping_options = "";

#-- External software

$metabat_soft       = "$installpath/bin/metabat2";
$maxbin_soft        = "$installpath/bin/MaxBin/run_MaxBin.pl";
$spades_soft        = "$installpath/bin/SPAdes/spades.py";
$barrnap_soft       = "$installpath/bin/barrnap";
$rdpclassifier_soft = "java -jar $installpath/bin/classifier.jar";
$bowtie2_build_soft = "$installpath/bin/bowtie2/bowtie2-build";
$bowtie2_x_soft     = "$installpath/bin/bowtie2/bowtie2";
$bwa_soft           = "$installpath/bin/bwa";
$minimap2_soft      = "$installpath/bin/minimap2";
$diamond_soft       = "$installpath/bin/diamond";
$hmmer_soft         = "$installpath/bin/hmmer/hmmsearch";
$megahit_soft       = "$installpath/bin/megahit/megahit";
$prinseq_soft       = "$installpath/bin/prinseq-lite.pl";
$prodigal_soft      = "$installpath/bin/prodigal";
$cdhit_soft         = "$installpath/bin/cd-hit-est";
$toamos_soft        = "$installpath/bin/AMOS/toAmos";
$minimus2_soft      = "$installpath/bin/AMOS/minimus2";
$checkm_soft        = "PATH=$installpath/bin:$installpath/bin/pplacer:$installpath/bin/hmmer:\$PATH $installpath/bin/checkm";
$minpath_soft       = "python3 $installpath/bin/MinPath1.4.py";
$canu_soft          = "$installpath/bin/canu/canu";
$flye_soft          = "$installpath/bin/Flye-2.8.1/bin/flye";
$trimmomatic_soft   = "java -jar $installpath/bin/trimmomatic-0.38.jar";
$dastool_soft       = "LD_LIBRARY_PATH=$installpath/lib PATH=$installpath/bin:\$PATH $installpath/bin/DAS_Tool/DAS_Tool";
$kmerdb_soft        = "LD_LIBRARY_PATH=$installpath/lib $installpath/bin/kmer-db";
$aragorn_soft       = "$installpath/bin/aragorn";
$mothur_soft        = "$installpath/bin/mothur";

#-- Options

$numthreads         = 70;
$mincontiglen       = 200;

I installed SqueezeMeta from conda and listed some configuration files methods.txt progress.txt syslog.txt

jtamames commented 3 years ago

Hello Difficult to know what is going on. Apparently the counting is progressing normally, because it is creating the corresponding count.* files in the temp directory (btw, could you take a look at any of these?). But at some point it stops. I would like to check if that has to do with the sam file. Please do the following: In the 00.human-oral.samples file that you will find in the data directory of the project, remove the entries corresponding to the first sample and run script 10 again. It will start working with the second sample and we will see if it stops again. Also, wild guess, could you decrease $numthreads in SqueezeMeta_conf.pl to 12 and run it again? Best, J

linfanxiao commented 3 years ago

LOL... I decreased the num threads to 12 and it works well, so weird... Thanks !!!

jtamames commented 3 years ago

Great to know it's working now. Some configuration issue in your cluster perhaps? Best, J