mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
89 stars 37 forks source link

Unable to compute haplogroup #82

Open Ssocarrat opened 5 years ago

Ssocarrat commented 5 years ago

Hi, I've been trying to use MToolbox for a while. I have latest version and reinstaled everything at least two times.

I have tried with my own bam file, solving problem by problem until I reached a point where I don't known what else to do. So, I used the example data set HG00119 to see if it gives me any error (It's the log under this text) and I get the same error as my sample.

How can I solve this? Thanks for your help in advance!

""""bash MToolBox.sh -i HG00119.conf

setting up MToolBox environment variables... ...done

setting up MToolBox variables in config file ... ...done

HG00119 will be used as vcf file name...

Check python version... (2.7 required) OK.

Checking files to be used in MToolBox execution...

Checking mapExome parameters... OK.

Checking assembleMTgenome parameters... OK.

Checking mt-classifier parameters... OK.

Input type is fastq. MToolBox.sh: line 184: cd: /home/juan/Desktop/MToolBox-master/MToolBox/test/HG00119_example/: No such file or directory output files will be placed in /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119/

EXECUTING READ MAPPING WITH MAPEXOME...

mapExome for sample SRR043366, files found: SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz Mapping onto mtDNA... /home/juan/Desktop/MToolBox-master/bin/gmap/bin/gsnap -D /home/juan/Desktop/MToolBox-master/gmapdb/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz > /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119//OUT_SRR043366/outmt.sam 2> /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119//OUT_SRR043366/logmt.txt Extracting FASTQ from SAM... Mapping onto complete human genome...single reads Mapping onto complete human genome...pair reads Reading Results... Filtering reads... Outfile saved on /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119//OUT_SRR043366/OUT.sam. Done.

SAM files post-processing...

SORTING OUT.sam FILES WITH PICARDTOOLS...

[Fri Jul 26 14:09:04 CEST 2019] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Fri Jul 26 14:09:04 CEST 2019] Executing as juan@juan-virtual-machine on Linux 4.15.0-38-generic amd64; OpenJDK 64-Bit Server VM 10.0.2+13-Ubuntu-1ubuntu0.18.04.2; Picard version: 1.98(1547) INFO 2019-07-26 14:09:07 SortSam Finished reading inputs, merging and writing to output now. [Fri Jul 26 14:09:08 CEST 2019] net.sf.picard.sam.SortSam done. Elapsed time: 0,08 minutes. Runtime.totalMemory()=150708224 Success.

Skip Indel Realigner... Skipping Mark Duplicates... [Fri Jul 26 14:09:09 CEST 2019] net.sf.picard.sam.SamFormatConverter INPUT=OUT.sam.bam.marked.bam OUTPUT=OUT.sam.bam.marked.bam.marked.sam TMP_DIR=[/home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Fri Jul 26 14:09:09 CEST 2019] Executing as juan@juan-virtual-machine on Linux 4.15.0-38-generic amd64; OpenJDK 64-Bit Server VM 10.0.2+13-Ubuntu-1ubuntu0.18.04.2; Picard version: 1.98(1547) [Fri Jul 26 14:09:11 CEST 2019] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0,04 minutes. Runtime.totalMemory()=48693248

ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...

WARNING: values of tail < 5 are deprecated and will be replaced with 5

[mpileup] 1 samples in 1 input files

Set max per-file depth to 8000 ##### GENERATING VCF OUTPUT... Reference sequence used for VCF: RCRS ##### PREDICTING HAPLOGROUPS AND ANNOTATING/PRIORITIZING VARIANTS... Haplogroup predictions based on RSRS Phylotree build 17 Your best results file is mt_classification_best_results.csv Unable to compute haplogroup. ExitParsing pathogenicity table... Parsing variability data... Parsing info about haplogroup-defining sites... Traceback (most recent call last): File "/home/juan/Desktop/MToolBox-master/MToolBox/variants_functional_annotation.py", line 429, in d, g, haplo, hapconto, best = data_parsing(patho_file, site_file, bestres_file, haptab_file) File "/home/juan/Desktop/MToolBox-master/MToolBox/variants_functional_annotation.py", line 201, in data_parsing htree = tree.HaplogroupTree(pickle_data=open(data_file +'/data/phylotree_r17.pickle', 'rb').read()) File "/home/juan/Desktop/MToolBox-master/MToolBox/classifier/tree.py", line 259, in __init__ self.deserialize(pickle_data) File "/home/juan/Desktop/MToolBox-master/MToolBox/classifier/tree.py", line 314, in deserialize self._aplo_dict = pickle.loads(data) File "/home/juan/Desktop/MToolBox-master/bin/anaconda/lib/python2.7/pickle.py", line 1388, in loads return Unpickler(file).load() File "/home/juan/Desktop/MToolBox-master/bin/anaconda/lib/python2.7/pickle.py", line 864, in load dispatch[key](self) File "/home/juan/Desktop/MToolBox-master/bin/anaconda/lib/python2.7/pickle.py", line 1157, in load_get self.append(self.memo[self.readline()[:-1]]) KeyError: '8595' Looking for prioritized variants... Prioritization analysis done. Traceback (most recent call last): File "/home/juan/Desktop/MToolBox-master/MToolBox/summary.py", line 79, in output_file.write(str(k)+"\t"+str(dic_cov[k])+"\t"+str(dpt)+"\t"+str(dic_haplo[k])+"\t"+str(dic_homo[k])+"\t"+str(dic_low_hetero[k])+"\t"+str(dic_high_hetero[k])+"\t"+str(dic_var[k])+"\t"+str(dic_prio[k])+"\n") KeyError: 'SRR043366' Analysis completed!"""
clody23 commented 5 years ago

Can you please copy-paste the HG00119.conf file you're using?

It looks like an error due to the working directory where your input files are placed

/home/juan/Desktop/MToolBox-master/MToolBox/test/HG00119_example/ does not exist.

Ssocarrat commented 5 years ago

Here are the files

HG00119.txt

I have also taken a look at the log of sample. It seems the input_path is ok, but I get similar errors to those of the test.

Sample A002 LOG test.txt

Amokelani commented 12 months ago

Hi I am trying to run MtoolBox on genomes but i seem to be getting the same error as the above mentioned Unable to compute haplogroup But i did not get this error for the exam samples, Does MTooBoX only work on exome samples? For genomes to do you have to specify specific parameters?