ngs-fzb / MTBseq_source

MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.
Other
41 stars 22 forks source link

Variant Calling with M bovis #57

Closed peflanag closed 3 years ago

peflanag commented 4 years ago

Hi,

I placed an M bovis fasta reference file in ref path and ran MTBseq with teh following command;

MTBseq --step TBfull --ref Mbovis_AF2122_97 --distance 5 --threads 1

The program ran until it errored with this;

[2020-09-07 18:20:11] Start variant calling... [2020-09-07 18:20:11] Start parsing Mbovis_AF2122_97.fasta... [2020-09-07 18:20:11] Finished parsing Mbovis_AF2122_97.fasta! [2020-09-07 18:20:11] Parsing Mbovis_AF2122_97_genes.txt... [2020-09-07 18:20:11] Can't open Mbovis_AF2122_97_genes.txt: TBtools line: 703 1 at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 703. My question is, do i need to have made some form of _genes.txt file? I didnt see tah in the manual. It just mentioned a reference fasta and an annotations file. I didnt add an annotation file since I am not worried about annotations. I have noticed that MTBseq seems to have made additional files based on the Mbovis reference but no .txt file. Cheers, P Screenshot 2020-09-08 at 10 09 12
TaKohl commented 4 years ago

Dear Peter,

as a short detour, I would strongly recommend to keep using the default reference genome of H37Rv for all M. tuberculosis complex samples. All genomes inside the M. tuberculosis complex are so similar, that you will have >99% of the H37Rv reference genome covered with NGS reads from M. bovis samples. Therefore, switching to a M. bovis genome is not necessary to improve resolution power. This will also allow you to keep the predefined annotation information regarding repetitive regions and resistance-associated genes and specific mutations.

If you want to switch to another reference genome, you also need to create the _genes.txt file. I agree that the relevant part of the manual could be understood in that this file is optional: "_In order for MTBseq to provide gene annotations, a respective annotation file with the extension _genes.txt needs to be placed in the same directory. For file formatting, follow the example of the existing annotation files, e.g. in the M._tuberculosis_H37Rv_2015-11-13_genes.txt file." However, it is sufficient to create an empty text file if you do not need the gene annotations. In this case, MTBseq will of course also not calculate amino acid exchanges.

Just for explanation, the additional files created are actually used for the reference mapping process.

best wishes, Thomas

peflanag commented 4 years ago

Hi Thomas,

Cheers for the comment and explanation on this!