Closed Confurious closed 5 years ago
I have the same problem and I'm trying to investigate it, since the maintainers team seems to be disappeared, both here and on the google group.
Hi guys,
sorry for the late reply. The project is still active, although unfortunately we can't put so much time on it. We are working on a major refactoring of the code, in the mean time we'll patch this specific error which seems pretty annoying.
Sorry again and thanks for using MToolBox and for your understanding.
Best,
Domenico
Hi guys,
can you please share the bam file that generates the error? That would help us debugging.
Many thanks
Best regards, Claudia
Il giorno mer 16 gen 2019 alle ore 19:04 Domenico Simone < notifications@github.com> ha scritto:
Hi guys,
sorry for the late reply. The project is still active, although unfortunately we can't put so much time on it. We are working on a major refactoring of the code, in the mean time we'll patch this specific error which seems pretty annoying.
Sorry again and thanks for using MToolBox and for your understanding.
Best,
Domenico
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mitoNGS/MToolBox/issues/64#issuecomment-454900127, or mute the thread https://github.com/notifications/unsubscribe-auth/AAh5zX_xxDBOd24Zg8OIaHLl8a2go7ILks5vD3fXgaJpZM4Vbu2Q .
-- Claudia
Hi, thank you for your reply. I've uploaded BAM file and configuration files at https://bitbucket.org/l0ka/mtoolbox_data/src/master/
The BAM file was extracted using BAMQL (https://github.com/BoutrosLaboratory/bamql), starting from a WGS file, (but I get the same error even if I perform the extraction with MToolBox, setting MitoExtraction=true
) using the following code:
bamql -I -o test.bam -f input.bam '(chr(M) & mate_chr(M)) | (chr(Y) & after(59000000) & mate_chr(M))'
Then I checked the resulting BAM with picard ValidateSamFile
and everything was ok, according to this tool.
I'm using MToolBox on a Linux server (Ubuntu 16.04.3 with Linux 4.4.0-101-generic) and with the default versions of its tools (I've performed the full installation with default parameters).
EDIT: I found the problem is related to GMAP-GSNAP. I updated it to the latest version (2018-07-04, from http://research-pub.gene.com/gmap/) and now MToolBox is working as expected. However I don't know why this problem was related to only some of my BAM files.
Hi,
we just pushed a change to the mtVariantCaller.py
script that should fix this.
Can you guys @Confurious @l0ka please do git pull and test MToolBox on your files again and let us know if this works?
Many thanks Claudia
I downloaded an installed MToolBox again, in another machine. Using the same BAM file, now I get this:
##### SORTING OUT.sam FILES WITH PICARDTOOLS...
[Sat Jan 26 12:27:57 CET 2019] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/test/OUT_test/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
INFO 2019-01-26 12:27:57 SortSam Finished reading inputs, merging and writing to output now.
[Sat Jan 26 12:27:57 CET 2019] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=56623104
Success.
##### REALIGNING KNOWN INDELS WITH GATK INDELREALIGNER...
Realigning known indels for file OUT_test/OUT.sam.bam using /mtoolbox_test/MToolBox/MToolBox//data/MITOMAP_HMTDB_known_indels.chrM as reference...
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: IndelRealigner
##### ERROR ------------------------------------------------------------------------------------------
The last process reported an error. Exit.
And if I skip GATK I get:
##### SORTING OUT.sam FILES WITH PICARDTOOLS...
[Sat Jan 26 12:29:21 CET 2019] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/test_noGATK/OUT_test/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
INFO 2019-01-26 12:29:22 SortSam Finished reading inputs, merging and writing to output now.
[Sat Jan 26 12:29:22 CET 2019] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=56623104
Success.
Skip Indel Realigner...
Skipping Mark Duplicates...
[Sat Jan 26 12:29:22 CET 2019] net.sf.picard.sam.SamFormatConverter INPUT=OUT.sam.bam.marked.bam OUTPUT=OUT.sam.bam.marked.bam.marked.sam TMP_DIR=[/test_noGATK/OUT_test/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Sat Jan 26 12:29:22 CET 2019] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=56623104
##### ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...
WARNING: values of tail < 5 are deprecated and will be replaced with 5
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
##### GENERATING VCF OUTPUT...
Reference sequence used for VCF: RCRS
##### PREDICTING HAPLOGROUPS AND ANNOTATING/PRIORITIZING VARIANTS...
Haplogroup predictions based on RSRS Phylotree build 17
Your best results file is mt_classification_best_results.csv
Loading contig sequences from file test-contigs.fasta
Unable to compute haplogroup. ExitParsing pathogenicity table...
Parsing variability data...
Parsing info about haplogroup-defining sites...
Parsing info about haplogroup assignments...
No annotation.csv found. Exit
I obtain the same results also using your sim_data:
##### SORTING OUT.sam FILES WITH PICARDTOOLS...
[Sat Jan 26 13:16:51 CET 2019] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/MToolBox/test/sim_data/OUT_simulation100X/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
INFO 2019-01-26 13:16:51 SortSam Finished reading inputs, merging and writing to output now.
[Sat Jan 26 13:16:51 CET 2019] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=56623104
Success.
Skip Indel Realigner...
Skipping Mark Duplicates...
[Sat Jan 26 13:16:51 CET 2019] net.sf.picard.sam.SamFormatConverter INPUT=OUT.sam.bam.marked.bam OUTPUT=OUT.sam.bam.marked.bam.marked.sam TMP_DIR=[/MToolBox/test/sim_data/OUT_simulation100X/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Sat Jan 26 13:16:52 CET 2019] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=56623104
##### ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...
WARNING: values of tail < 5 are deprecated and will be replaced with 5
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
##### GENERATING VCF OUTPUT...
Reference sequence used for VCF: RCRS
##### PREDICTING HAPLOGROUPS AND ANNOTATING/PRIORITIZING VARIANTS...
Haplogroup predictions based on RSRS Phylotree build 17
Your best results file is mt_classification_best_results.csv
Loading contig sequences from file simulation100X-contigs.fasta
Unable to compute haplogroup. ExitParsing pathogenicity table...
Parsing variability data...
Parsing info about haplogroup-defining sites...
Parsing info about haplogroup assignments...
No annotation.csv found. Exit
Configuration file is the following
mtdb_fasta=chrM.fa
hg19_fasta=hg19RCRS.fa
mtdb=chrM
humandb=hg19RCRS
input_path=/MToolBox/test/sim_data/
output_name=/MToolBox/test/sim_data
list=test_list.lst
input_type=fastq
ref=RCRS
UseMarkDuplicates=false
UseIndelRealigner=false
MitoExtraction=false
Dear @l0ka,
the analysis on sim_data
is expected to fail on haplogroup predictions, because those are simulated sequences, therefore it is likely that your MToolBox run on sim_data was actually successful.
Could you please test the latest MToolBox version - that you can get by doing a git pull in your local repo (no need to re-install) - on the MToolBox HG00119 example? You can find instructions here: https://github.com/mitoNGS/MToolBox/blob/master/test/HG00119_example/run_mtoolbox_on_test_file.md.
Moreover, I tested the latest fix with the bam file you shared and MToolBox ran smoothly...so I cannot figure out where your error is coming from. These are some of MToolBox files generated from your test.bam, with this latest MToolBox fix:
log_mtoolbox.txt logassemble.txt logmt.txt test.vcf.zip
Also, please not that you can now assign a name to your vcf file by specifying that in your configuration file with the vcf_name
argument.
Best regards, Claudia
I tested the latest MToolBox version (by doing git pull in my local repo ), on the same server I was initially experiencing the errors. Both tests (on HG00119 and on my BAM file) run smoothly.
Thank you for the support!
I tested MToolBox on almost 700 samples without problems. Thank you again.
Great to hear this!!! Thank you and stay tuned for future updates! Just drop a line for any comment/suggestion!
Hello, I see at least two people encountering the same issue , which seems to be a problem with the median function defined in the mtVariantCaller.py script. Can any one help? Thanks
ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...
WARNING: values of tail < 5 are deprecated and will be replaced with 5
[bam_sort_core] merging from 6 files... [bam_translate] RG tag "sample" on read "A00509:9:HF2LVDMXX:1:1104:15076:9972" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "sample" on read "A00509:9:HF2LVDMXX:1:2222:27968:7122" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "sample" on read "A00509:9:HF2LVDMXX:1:1156:6885:20901" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "sample" on read "A00509:9:HF2LVDMXX:1:1365:29279:35164" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "sample" on read "A00509:9:HF2LVDMXX:1:1457:21531:2988" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "sample" on read "A00509:9:HF2LVDMXX:1:1244:12102:15029" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. **mpileup] 1 samples in 1 input files