jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
102 stars 42 forks source link

Wrong Results for ERR133900, ERR137282 #197

Closed aniruddh-1 closed 2 years ago

aniruddh-1 commented 2 years ago

hey @jodyphelan when I ran these SRA Accession Number on TB-Profiler_v4.0.1, v4.0.1, v4.0.2, v4.0.3 in conda Package and Docker Container it is showing that it is sensitive for all drugs, while these Accession Numbers are MDR, I confirmed it using mykrobe_v0.10.0 and also the web version of TB-Profiler(https://tbdr.lshtm.ac.uk/).

These are results in tb-profiler.txt when I use tb-profiler collate :

sample  main_lineage    sub_lineage DR_type MDR XDR num_dr_variants num_other_variants  rifampicin  isoniazid   pyrazinamide    ethambutol  streptomycin    fluoroquinolones    moxifloxacin    ofloxacin   levofloxacin    ciprofloxacin   aminoglycosides amikacin    kanamycin   capreomycin ethionamide para-aminosalicylic_acid    cycloserine linezolid   bedaquiline clofazimine delamanid
ERR133900   lineage2    lineage2.2  Sensitive           0   0   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -
ERR137282           Sensitive           0   0   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -

Can you please look over the package and tell why is it giving wrong results? I tried it both in ubuntu 18.04, ubuntu 20.04 but it is still giving wrong results.

jodyphelan commented 2 years ago

Hi @aniruddh-1

Could you let me know what the command you are using? I've tried it with v4.0.3 and after using collate this is what I get

sample  main_lineage    sub_lineage DR_type num_dr_variants num_other_variants  rifampicin  isoniazid   pyrazinamide    ethambutol  streptomycin    fluoroquinolones    moxifloxacin    ofloxacin   levofloxacin    ciprofloxacin   aminoglycosides amikacin    kanamycin   capreomycin ethionamide para-aminosalicylic_acid    cycloserine linezolid   bedaquiline clofazimine delamanid
ERR133900   lineage2    lineage2.2.1    Other   5   29  -   katG_p.Ser315Thr    pncA_p.Val7Gly  -   rpsL_p.Lys43Arg gyrA_p.Ala74Ser gyrA_p.Ala74Ser gyrA_p.Ala74Ser gyrA_p.Ala74Ser gyrA_p.Ala74Ser rrs_n.1401A>G   rrs_n.1401A>G   rrs_n.1401A>G   rrs_n.1401A>G   --  -   -   -   -   -
ERR137282   lineage2    lineage2.2.1    MDR 6   25  rpoB_p.Ser450Leu    katG_p.Ser315Thr    -   embB_p.Asn296His    rpsL_p.Lys43Arg -   -   -   -   -   rrs_n.1401A>G   rrs_n.1401A>G   rrs_n.1401A>G   rrs_n.1401A>G   ethA_c.110delA  -   -   -   -   -   -
aniruddh-1 commented 2 years ago

this command, same for also ERR137282: tb-profiler profile -1 ERR133900_1.fastq.gz -2 ERR133900_2.fastq.gz -p ERR133900

can you test this conda package in completely new ubuntu 18 or 20 environment? also please check if this is giving same results in docker container, because I am not getting the results which you got in your machine.

aniruddh-1 commented 2 years ago

This is in ubutnu:20.04, conda 4.11.0, TB-Profiler_v4.0.3

(base) root:/home/ubuntu# tb-profiler profile -1 ERR133900_1.fastq.gz -2 ERR133900_2.fastq.gz -p ERR133900
Using gff file: /root/anaconda3/share/tbprofiler/tbdb.gff
Using ref file: /root/anaconda3/share/tbprofiler/tbdb.fasta
Using barcode file: /root/anaconda3/share/tbprofiler/tbdb.barcode.bed
Using bed file: /root/anaconda3/share/tbprofiler/tbdb.bed
Using json_db file: /root/anaconda3/share/tbprofiler/tbdb.dr.json
Using version file: /root/anaconda3/share/tbprofiler/tbdb.version.json
Using variables file: /root/anaconda3/share/tbprofiler/tbdb.variables.json

Running command:
set -u pipefail; trimmomatic PE -threads 1 -phred33 ERR133900_1.fastq.gz ERR133900_2.fastq.gz -baseout ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36

Running command:
set -u pipefail; cat ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_1U ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_2U > ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_TU

Running command:
set -u pipefail; rm ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_1U ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_2U

Running command:
set -u pipefail; bwa mem -t 1 -c 100 -R '@RG\tID:ERR133900\tSM:ERR133900\tPL:illumina' -M -T 50 /root/anaconda3/share/tbprofiler/tbdb.fasta ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_1P ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_2P | samtools sort -@ 1 -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.pair.bam -

Running command:
set -u pipefail; bwa mem -t 1 -c 100 -R '@RG\tID:ERR133900\tSM:ERR133900\tPL:illumina' -M -T 50 /root/anaconda3/share/tbprofiler/tbdb.fasta ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af_TU | samtools sort -@ 1 -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.single.bam -

Running command:
set -u pipefail; samtools merge -@ 1 -f ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.unsort.bam ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.pair.bam ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.single.bam

Running command:
set -u pipefail; samtools sort -m 768M -n -@ 1  ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.unsort.bam | samtools fixmate -@ 1 -m - - | samtools sort -m 768M -@ 1 - | samtools markdup -@ 1 - ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam

Running command:
set -u pipefail; rm ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.single.bam ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.pair.bam ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.unsort.bam

Running command:
set -u pipefail; samtools index -@ 1 ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam
Using ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam

Please ensure that this BAM was made using the same reference as in the database.
If you are not sure what reference was used it is best to remap the reads.

Running command:
set -u pipefail; cat /root/anaconda3/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | parallel -j 1 --col-sep " " "samtools view -T /root/anaconda3/share/tbprofiler/tbdb.fasta -h ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam {1} | samclip --ref /root/anaconda3/share/tbprofiler/tbdb.fasta | samtools view -b > ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.{2}.tmp.bam && samtools index ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.{2}.tmp.bam && freebayes -f /root/anaconda3/share/tbprofiler/tbdb.fasta -r {1} --haplotype-length -1  ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.{2}.tmp.bam | bcftools view -c 1 | bcftools norm -f /root/anaconda3/share/tbprofiler/tbdb.fasta | bcftools filter -t {1} -e 'FMT/DP<10' -S . -Oz -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.{2}.vcf.gz"

Running command:
set -u pipefail; cat /root/anaconda3/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | parallel -j 1 --col-sep " " "bcftools index  ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.{2}.vcf.gz"

Running command:
set -u pipefail; bcftools concat -aD `cat /root/anaconda3/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | awk '{print "./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af."$2".vcf.gz"}'` | bcftools view -c1 -a -Oz -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.vcf.gz

Running command:
set -u pipefail; rm `cat /root/anaconda3/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | awk '{print "./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af."$2".vcf.gz*"}'`

Running command:
set -u pipefail; rm `cat /root/anaconda3/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | awk '{print "./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af."$2".tmp.bam*"}'`

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.vcf.gz

Running command:
set -u pipefail; bcftools view -v snps ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.vcf.gz | combine_vcf_variants.py --ref /root/anaconda3/share/tbprofiler/tbdb.fasta --gff /root/anaconda3/share/tbprofiler/tbdb.gff | rename_vcf_chrom.py --source Chromosome --target Chromosome | snpEff -Djava.net.useSystemProxies=true ann -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target Chromosome > e337c8f8-c9e5-444e-912a-b60af2440e3e.vcf

Running command:
set -u pipefail; bcftools view -v indels ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.vcf.gz | rename_vcf_chrom.py --source Chromosome --target Chromosome | snpEff -Djava.net.useSystemProxies=true ann -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target Chromosome > 820a17bb-5f74-417a-a70c-1271211f8e1b.vcf

Running command:
set -u pipefail; bcftools concat e337c8f8-c9e5-444e-912a-b60af2440e3e.vcf 820a17bb-5f74-417a-a70c-1271211f8e1b.vcf | bcftools sort -Oz -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.csq.vcf.gz
Removing e337c8f8-c9e5-444e-912a-b60af2440e3e.vcf
Removing 820a17bb-5f74-417a-a70c-1271211f8e1b.vcf

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -u -f '%CHROM\t%POS\t%REF\t%ALT\t%ANN\t[%AD]\n' ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.targets.csq.vcf.gz

Running command:
set -u pipefail; samtools flagstat -O json ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam > 998b3336-66e9-4f9e-be2e-fa4628e0261b

Running command:
set -u pipefail; bedtools genomecov -ibam ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam

Running command:
set -u pipefail; samtools view -Mb -L /root/anaconda3/share/tbprofiler/tbdb.bed ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam | bedtools coverage -a /root/anaconda3/share/tbprofiler/tbdb.bed -b - -d -sorted

Running command:
set -u pipefail; samtools view -Mb -L /root/anaconda3/share/tbprofiler/tbdb.barcode.bed ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam > ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.tmp.bam

Running command:
set -u pipefail; samtools index ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.tmp.bam

Running command:
set -u pipefail; freebayes -f /root/anaconda3/share/tbprofiler/tbdb.fasta -t /root/anaconda3/share/tbprofiler/tbdb.barcode.bed ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.tmp.bam --haplotype-length -1 | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%GT\t%AD]\n'

Running command:
set -u pipefail; bedtools getfasta -fi /root/anaconda3/share/tbprofiler/tbdb.fasta -bed /root/anaconda3/share/tbprofiler/tbdb.barcode.bed

Running command:
set -u pipefail; samtools view -Mb -L /root/anaconda3/share/tbprofiler/tbdb.barcode.bed ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam | bedtools coverage -a /root/anaconda3/share/tbprofiler/tbdb.barcode.bed -b - -d -sorted

Running command:
set -u pipefail; delly call -t DEL -g /root/anaconda3/share/tbprofiler/tbdb.fasta ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.bam -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.bcf

Running command:
set -u pipefail; bcftools query -l ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.bcf

Running command:
set -u pipefail; bcftools view -c 2  ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.bcf | bcftools view -e '(INFO/END-POS)>=100000' -Oz -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.tmp.delly.vcf.gz

Running command:
set -u pipefail; bcftools index ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.tmp.delly.vcf.gz

Running command:
set -u pipefail; bcftools view -R /root/anaconda3/share/tbprofiler/tbdb.bed ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.tmp.delly.vcf.gz -Oz -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.vcf.gz

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.vcf.gz

Running command:
set -u pipefail; bcftools view ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.vcf.gz | rename_vcf_chrom.py --source Chromosome --target Chromosome | snpEff -Djava.net.useSystemProxies=true ann -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target Chromosome | bcftools view -Oz -o ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.csq.vcf.gz

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -u -f '%CHROM\t%POS\t%REF\t%ALT\t%ANN\t[%AD]\n' ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af.delly.csq.vcf.gz

Running command:
set -u pipefail; rm ./17ba8a8e-bf08-4e98-b7aa-1c67e302a0af*

Profiling finished sucessfully!
(base) root:/home/ubuntu# tb-profiler collate
Using gff file: /root/anaconda3/share/tbprofiler/tbdb.gff
Using ref file: /root/anaconda3/share/tbprofiler/tbdb.fasta
Using barcode file: /root/anaconda3/share/tbprofiler/tbdb.barcode.bed
Using bed file: /root/anaconda3/share/tbprofiler/tbdb.bed
Using json_db file: /root/anaconda3/share/tbprofiler/tbdb.dr.json
Using version file: /root/anaconda3/share/tbprofiler/tbdb.version.json
Using variables file: /root/anaconda3/share/tbprofiler/tbdb.variables.json
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 198.96it/s]
(base) root:/home/ubuntu# cat tbprofiler.txt
sample  main_lineage    sub_lineage     DR_type num_dr_variants num_other_variants      rifampicin      isoniazid       pyrazinamide    ethambutol      streptomycin    fluoroquinolones    moxifloxacin     ofloxacin       levofloxacin    ciprofloxacin   aminoglycosides amikacin        kanamycin       capreomycin     ethionamide     para-aminosalicylic_acid        cycloserine linezolid        bedaquiline     clofazimine     delamanid
ERR133900       lineage2;lineage4       lineage4.9;lineage2.2   Sensitive       0       0       -       -       -       -       -       -       -       -       -       -       -       -   --       -       -       -       -       -       -       -
aniruddh-1 commented 2 years ago

Hey @jodyphelan Is it giving any resistance in ubuntu machine or in the Docker container? I am getting resistance results only in Mac laptop, why is it giving different results in different OS?

jodyphelan commented 2 years ago

Hi @aniruddh-1

My tests were perfomed on ubuntu 18.04. Everything with the commands looks ok. Could you perhaps run md5sum on the fastq files? Just so we can check the input data is the same.

aniruddh-1 commented 2 years ago
(base) root:/home/ubuntu# md5sum ERR133900_1.fastq.gz
5e4f52204143c09018991603c421cdfe  ERR133900_1.fastq.gz
(base) root:/home/ubuntu# md5sum ERR133900_2.fastq.gz
2d7f298f8f61d578117b8c530ff7a786  ERR133900_2.fastq.gz
(base) root:/home/ubuntu#
aniruddh-1 commented 2 years ago

Hi @jodyphelan

Also these are Mykrobe_v0.10.0 results:

(base) root:/home/ubuntu# mykrobe predict -S tb -s ERR133900 -i ERR133900_1.fastq.gz ERR133900_2.fastq.gz -o ERR133900.csv
[mykrobe 2021-11-26T14:53:55 INFO] Start runnning mykrobe predict. Command line: /usr/local/bin/mykrobe predict -S tb -s ERR133900 -i ERR133900_1.fastq.gz ERR133900_2.fastq.gz -o ERR133900.csv
[mykrobe 2021-11-26T14:53:55 INFO] Running mykrobe predict using species tb, and panel version 20201014
[mykrobe 2021-11-26T14:53:55 INFO] Run command: /usr/local/lib/python3.8/dist-packages/mykrobe/cortex/mccortex31 geno -t 1 -m 1GB -k 21 -o /tmp/tmp1kgny5wg/ERR133900-21_tb-species-170421-tb-hunt-probe-set-jan-03-2019-tb.covgs -I mykrobe/data/skeletons/tb-species-170421-tb-hunt-probe-set-jan-03-2019-tb_21.ctx -s ERR133900-21 -1 ERR133900_1.fastq.gz -1 ERR133900_2.fastq.gz -c /usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz -c /usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz -c /usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz /tmp/tmp1kgny5wg/ERR133900-21_tb-species-170421-tb-hunt-probe-set-jan-03-2019-tb.ctx
[mykrobe 2021-11-26T14:56:52 WARNING] Setting conf_percent_cutoff to 90 (was not specified, and --ont flag was not used)
[mykrobe 2021-11-26T14:56:53 INFO] Progress: finished making AMR predictions
[mykrobe 2021-11-26T14:56:53 INFO] Progress: writing output
[mykrobe 2021-11-26T14:56:53 INFO] Progress: finished
(base) root:/home/ubuntu# cat ERR133900.csv
"sample","drug","susceptibility","variants (dna_variant-AA_variant:ref_kmer_count:alt_kmer_count:conf) [use --format json for more info]","genes (prot_mut-ref_mut:percent_covg:depth) [use --format json for more info]","mykrobe_version","files","probe_sets","genotype_model","kmer_size","phylo_group","species","lineage","phylo_group_per_covg","species_per_covg","lineage_per_covg","phylo_group_depth","species_depth","lineage_depth"
"ERR133900","Amikacin","R","rrs_A1401X-A1473246G:20:3455:14063","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Capreomycin","R","rrs_A1401X-A1473246G:20:3455:14063","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Ciprofloxacin","R","gyrA_A74S-GCC7521TCC:71:2627:10465","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Ethambutol","S","","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Isoniazid","R","katG_S315T-GCT2155167GGT:0:3491:21075","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Kanamycin","R","rrs_A1401X-A1473246G:20:3455:14063","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Moxifloxacin","R","gyrA_A74S-GCC7521TCC:71:2627:10465","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Ofloxacin","R","gyrA_A74S-GCC7521TCC:71:2627:10465","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Pyrazinamide","R","pncA_V7G-GAC2289221GCC:0:3179:19782","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Rifampicin","S","","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
"ERR133900","Streptomycin","R","rpsL_K43R-AAG781686AGG:0:3567:21390","","v0.10.0","ERR133900_1.fastq.gz;ERR133900_2.fastq.gz","/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-species-170421.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb-hunt-probe-set-jan-03-2019.fasta.gz;/usr/local/lib/python3.8/dist-packages/mykrobe/data/tb/tb.lineage.20200930.probes.fa.gz","kmer_count","21","Mycobacterium_tuberculosis_complex","Mycobacterium_tuberculosis","lineage2.2.9","99.737","98.81","NA","164","155.0","NA"
(base) root:/home/ubuntu#

Also did you tested it in the container?(https://quay.io/repository/biocontainers/tb-profiler?tab=tags)

jodyphelan commented 2 years ago

Interesting, I am getting a different md5sum. Might be worthwhile seeing if your fastQ file is not corrupted in any way (although it is interesting that mykrobe and the web-tb-profiler worked).

7f1ec3e54ae98bc52015c92883dbf30c  ERR133900/ERR133900_1.fastq.gz
4ec76989bd291f8997ae80991b34f0c9  ERR133900/ERR133900_2.fastq.gz

Could you attach the json result file for ERR133900?

Mykrobe results seem to match up with what I am getting. I haven't tried the docker version yet. Thanks

aniruddh-1 commented 2 years ago

Hey @jodyphelan

I have attached fastq files as well as results files generated with TB-Profiler. Link: https://drive.google.com/drive/folders/1O4YY6MqtTWXBi1btekh5pOd3AF3eyKDK?usp=sharing

Thanks

aniruddh-1 commented 2 years ago

Hey @jodyphelan

md5sum may be different because I downloaded and processed these files using SRA Toolkit.

I also ran these files in the TB-Profiler docker, here is the log file:

(base) root:/home/ubuntu# ls
ERR133900_1.fastq.gz  bam       tbprofiler.dr.indiv.itol.txt  tbprofiler.json              tbprofiler.txt           vcf
ERR133900_2.fastq.gz  results  tbprofiler.dr.itol.txt        tbprofiler.lineage.itol.txt  tbprofiler.variants.txt
(base) root:/home/ubuntu# docker images
REPOSITORY                          TAG                     IMAGE ID       CREATED      SIZE
quay.io/biocontainers/tb-profiler   4.0.3--pypyh5e36f6f_0   885c62b39619   5 days ago   1.41GB
(base) root:/home/ubuntu# docker run -t -d 885c62b39619
e5060fbb2ad1787ea32d269ebbdff7ab08c3f27c585f089c64cc6ec854ef30e1
(base) root:/home/ubuntu# docker ps
CONTAINER ID   IMAGE          COMMAND                  CREATED         STATUS         PORTS     NAMES
e5060fbb2ad1   885c62b39619   "/usr/local/env-exec…"   4 seconds ago   Up 3 seconds             exciting_meninsky
(base) root:/home/ubuntu# docker cp ERR133900_1.fastq.gz exciting_meninsky:/mnt/
(base) root:/home/ubuntu# docker cp ERR133900_2.fastq.gz exciting_meninsky:/mnt/
(base) root:/home/ubuntu# docker exec -it exciting_meninsky bash
root@e5060fbb2ad1:/# ls
bin      boot     dev      etc      home     lib      lib64    linuxrc  media    mnt      opt      proc     root     run      sbin     srv      sys      tmp      usr      var
root@e5060fbb2ad1:/# cd mnt
root@e5060fbb2ad1:/mnt# ls
ERR133900_1.fastq.gz  ERR133900_2.fastq.gz
root@e5060fbb2ad1:/mnt# tb-profiler --version
TBProfiler version 4.0.3
root@e5060fbb2ad1:/mnt# md5sum ERR133900_1.fastq.gz
5e4f52204143c09018991603c421cdfe  ERR133900_1.fastq.gz
root@e5060fbb2ad1:/mnt# md5sum ERR133900_2.fastq.gz
2d7f298f8f61d578117b8c530ff7a786  ERR133900_2.fastq.gz
root@e5060fbb2ad1:/mnt# tb-profiler profile -1 ERR133900_1.fastq.gz -2 ERR133900_2.fastq.gz -p ERR133900
Using gff file: /usr/local/share/tbprofiler/tbdb.gff
Using ref file: /usr/local/share/tbprofiler/tbdb.fasta
Using barcode file: /usr/local/share/tbprofiler/tbdb.barcode.bed
Using bed file: /usr/local/share/tbprofiler/tbdb.bed
Using json_db file: /usr/local/share/tbprofiler/tbdb.dr.json
Using version file: /usr/local/share/tbprofiler/tbdb.version.json
Using variables file: /usr/local/share/tbprofiler/tbdb.variables.json

Running command:
set -u pipefail; trimmomatic PE -threads 1 -phred33 ERR133900_1.fastq.gz ERR133900_2.fastq.gz -baseout ./012481b9-a973-47f6-8973-e48d05fd03d9 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36

Running command:
set -u pipefail; cat ./012481b9-a973-47f6-8973-e48d05fd03d9_1U ./012481b9-a973-47f6-8973-e48d05fd03d9_2U > ./012481b9-a973-47f6-8973-e48d05fd03d9_TU

Running command:
set -u pipefail; rm ./012481b9-a973-47f6-8973-e48d05fd03d9_1U ./012481b9-a973-47f6-8973-e48d05fd03d9_2U

Running command:
set -u pipefail; bwa mem -t 1 -c 100 -R '@RG\tID:ERR133900\tSM:ERR133900\tPL:illumina' -M -T 50 /usr/local/share/tbprofiler/tbdb.fasta ./012481b9-a973-47f6-8973-e48d05fd03d9_1P ./012481b9-a973-47f6-8973-e48d05fd03d9_2P | samtools sort -@ 1 -o ./012481b9-a973-47f6-8973-e48d05fd03d9.pair.bam -

Running command:
set -u pipefail; bwa mem -t 1 -c 100 -R '@RG\tID:ERR133900\tSM:ERR133900\tPL:illumina' -M -T 50 /usr/local/share/tbprofiler/tbdb.fasta ./012481b9-a973-47f6-8973-e48d05fd03d9_TU | samtools sort -@ 1 -o ./012481b9-a973-47f6-8973-e48d05fd03d9.single.bam -

Running command:
set -u pipefail; samtools merge -@ 1 -f ./012481b9-a973-47f6-8973-e48d05fd03d9.unsort.bam ./012481b9-a973-47f6-8973-e48d05fd03d9.pair.bam ./012481b9-a973-47f6-8973-e48d05fd03d9.single.bam

Running command:
set -u pipefail; samtools sort -m 768M -n -@ 1  ./012481b9-a973-47f6-8973-e48d05fd03d9.unsort.bam | samtools fixmate -@ 1 -m - - | samtools sort -m 768M -@ 1 - | samtools markdup -@ 1 - ./012481b9-a973-47f6-8973-e48d05fd03d9.bam

Running command:
set -u pipefail; rm ./012481b9-a973-47f6-8973-e48d05fd03d9.single.bam ./012481b9-a973-47f6-8973-e48d05fd03d9.pair.bam ./012481b9-a973-47f6-8973-e48d05fd03d9.unsort.bam

Running command:
set -u pipefail; samtools index -@ 1 ./012481b9-a973-47f6-8973-e48d05fd03d9.bam
Using ./012481b9-a973-47f6-8973-e48d05fd03d9.bam

Please ensure that this BAM was made using the same reference as in the database.
If you are not sure what reference was used it is best to remap the reads.

Running command:
set -u pipefail; cat /usr/local/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | parallel -j 1 --col-sep " " "samtools view -T /usr/local/share/tbprofiler/tbdb.fasta -h ./012481b9-a973-47f6-8973-e48d05fd03d9.bam {1} | samclip --ref /usr/local/share/tbprofiler/tbdb.fasta | samtools view -b > ./012481b9-a973-47f6-8973-e48d05fd03d9.{2}.tmp.bam && samtools index ./012481b9-a973-47f6-8973-e48d05fd03d9.{2}.tmp.bam && freebayes -f /usr/local/share/tbprofiler/tbdb.fasta -r {1} --haplotype-length -1  ./012481b9-a973-47f6-8973-e48d05fd03d9.{2}.tmp.bam | bcftools view -c 1 | bcftools norm -f /usr/local/share/tbprofiler/tbdb.fasta | bcftools filter -t {1} -e 'FMT/DP<10' -S . -Oz -o ./012481b9-a973-47f6-8973-e48d05fd03d9.{2}.vcf.gz"

Running command:
set -u pipefail; cat /usr/local/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | parallel -j 1 --col-sep " " "bcftools index  ./012481b9-a973-47f6-8973-e48d05fd03d9.{2}.vcf.gz"

Running command:
set -u pipefail; bcftools concat -aD `cat /usr/local/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | awk '{print "./012481b9-a973-47f6-8973-e48d05fd03d9."$2".vcf.gz"}'` | bcftools view -c1 -a -Oz -o ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.vcf.gz

Running command:
set -u pipefail; rm `cat /usr/local/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | awk '{print "./012481b9-a973-47f6-8973-e48d05fd03d9."$2".vcf.gz*"}'`

Running command:
set -u pipefail; rm `cat /usr/local/share/tbprofiler/tbdb.bed | awk '{print $1":"$2"-"$3" "$1"_"$2"_"$3}' | awk '{print "./012481b9-a973-47f6-8973-e48d05fd03d9."$2".tmp.bam*"}'`

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.vcf.gz

Running command:
set -u pipefail; bcftools view -v snps ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.vcf.gz | combine_vcf_variants.py --ref /usr/local/share/tbprofiler/tbdb.fasta --gff /usr/local/share/tbprofiler/tbdb.gff | rename_vcf_chrom.py --source Chromosome --target Chromosome | snpEff -Djava.net.useSystemProxies=true ann -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target Chromosome > 86a9b14b-c124-4cae-bab9-248a7d15ae2b.vcf

Running command:
set -u pipefail; bcftools view -v indels ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.vcf.gz | rename_vcf_chrom.py --source Chromosome --target Chromosome | snpEff -Djava.net.useSystemProxies=true ann -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target Chromosome > 17c0a779-f3ad-4c03-8135-549f43e65720.vcf

Running command:
set -u pipefail; bcftools concat 86a9b14b-c124-4cae-bab9-248a7d15ae2b.vcf 17c0a779-f3ad-4c03-8135-549f43e65720.vcf | bcftools sort -Oz -o ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.csq.vcf.gz
Removing 86a9b14b-c124-4cae-bab9-248a7d15ae2b.vcf
Removing 17c0a779-f3ad-4c03-8135-549f43e65720.vcf

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -u -f '%CHROM\t%POS\t%REF\t%ALT\t%ANN\t[%AD]\n' ./012481b9-a973-47f6-8973-e48d05fd03d9.targets.csq.vcf.gz

Running command:
set -u pipefail; samtools flagstat -O json ./012481b9-a973-47f6-8973-e48d05fd03d9.bam > 09c6a077-ee4a-450d-9c19-62c7ba7342b7

Running command:
set -u pipefail; bedtools genomecov -ibam ./012481b9-a973-47f6-8973-e48d05fd03d9.bam

Running command:
set -u pipefail; samtools view -Mb -L /usr/local/share/tbprofiler/tbdb.bed ./012481b9-a973-47f6-8973-e48d05fd03d9.bam | bedtools coverage -a /usr/local/share/tbprofiler/tbdb.bed -b - -d -sorted

Running command:
set -u pipefail; samtools view -Mb -L /usr/local/share/tbprofiler/tbdb.barcode.bed ./012481b9-a973-47f6-8973-e48d05fd03d9.bam > ./012481b9-a973-47f6-8973-e48d05fd03d9.tmp.bam

Running command:
set -u pipefail; samtools index ./012481b9-a973-47f6-8973-e48d05fd03d9.tmp.bam

Running command:
set -u pipefail; freebayes -f /usr/local/share/tbprofiler/tbdb.fasta -t /usr/local/share/tbprofiler/tbdb.barcode.bed ./012481b9-a973-47f6-8973-e48d05fd03d9.tmp.bam --haplotype-length -1 | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%GT\t%AD]\n'

Running command:
set -u pipefail; bedtools getfasta -fi /usr/local/share/tbprofiler/tbdb.fasta -bed /usr/local/share/tbprofiler/tbdb.barcode.bed

Running command:
set -u pipefail; samtools view -Mb -L /usr/local/share/tbprofiler/tbdb.barcode.bed ./012481b9-a973-47f6-8973-e48d05fd03d9.bam | bedtools coverage -a /usr/local/share/tbprofiler/tbdb.barcode.bed -b - -d -sorted

Running command:
set -u pipefail; delly call -t DEL -g /usr/local/share/tbprofiler/tbdb.fasta ./012481b9-a973-47f6-8973-e48d05fd03d9.bam -o ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.bcf

Running command:
set -u pipefail; bcftools query -l ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.bcf

Running command:
set -u pipefail; bcftools view -c 2  ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.bcf | bcftools view -e '(INFO/END-POS)>=100000' -Oz -o ./012481b9-a973-47f6-8973-e48d05fd03d9.tmp.delly.vcf.gz

Running command:
set -u pipefail; bcftools index ./012481b9-a973-47f6-8973-e48d05fd03d9.tmp.delly.vcf.gz

Running command:
set -u pipefail; bcftools view -R /usr/local/share/tbprofiler/tbdb.bed ./012481b9-a973-47f6-8973-e48d05fd03d9.tmp.delly.vcf.gz -Oz -o ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.vcf.gz

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.vcf.gz

Running command:
set -u pipefail; bcftools view ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.vcf.gz | rename_vcf_chrom.py --source Chromosome --target Chromosome | snpEff -Djava.net.useSystemProxies=true ann -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target Chromosome | bcftools view -Oz -o ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.csq.vcf.gz

Running command:
set -u pipefail; bcftools index --threads 1 -ft ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -l ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.csq.vcf.gz

Running command:
set -u pipefail; bcftools query -u -f '%CHROM\t%POS\t%REF\t%ALT\t%ANN\t[%AD]\n' ./012481b9-a973-47f6-8973-e48d05fd03d9.delly.csq.vcf.gz

Running command:
set -u pipefail; rm ./012481b9-a973-47f6-8973-e48d05fd03d9*

Profiling finished sucessfully!
root@e5060fbb2ad1:/mnt# ls
ERR133900_1.fastq.gz  ERR133900_2.fastq.gz  bam                   results               vcf
root@e5060fbb2ad1:/mnt# tb-profiler collate
Using gff file: /usr/local/share/tbprofiler/tbdb.gff
Using ref file: /usr/local/share/tbprofiler/tbdb.fasta
Using barcode file: /usr/local/share/tbprofiler/tbdb.barcode.bed
Using bed file: /usr/local/share/tbprofiler/tbdb.bed
Using json_db file: /usr/local/share/tbprofiler/tbdb.dr.json
Using version file: /usr/local/share/tbprofiler/tbdb.version.json
Using variables file: /usr/local/share/tbprofiler/tbdb.variables.json
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 213.80it/s]
root@e5060fbb2ad1:/mnt# ls
ERR133900_1.fastq.gz          bam                           tbprofiler.dr.indiv.itol.txt  tbprofiler.json               tbprofiler.txt                vcf
ERR133900_2.fastq.gz          results                       tbprofiler.dr.itol.txt        tbprofiler.lineage.itol.txt   tbprofiler.variants.txt
root@e5060fbb2ad1:/mnt# cat tbprofiler.txt
sample  main_lineage    sub_lineage     DR_type num_dr_variants num_other_variants      rifampicin      isoniazid       pyrazinamide    ethambutol      streptomycin    fluoroquinolones    moxifloxacin     ofloxacin       levofloxacin    ciprofloxacin   aminoglycosides amikacin        kanamycin       capreomycin     ethionamide     para-aminosalicylic_acid        cycloserine linezolid        bedaquiline     clofazimine     delamanid
ERR133900       lineage2;lineage4       lineage4.9;lineage2.2   Sensitive       0       0       -       -       -       -       -       -       -       -       -       -       -       -   --       -       -       -       -       -       -       -
root@e5060fbb2ad1:/mnt#
jodyphelan commented 2 years ago

Thanks for sending all the files through. I quick checked and it looks like the coverage is quite a lot lower in your bam file than expected. I then checked the mapping command and it seems like bwa has some issues with the formatting of the read names:

(tb-profiler) jody@server:/TBP$ bwa mem ~/refgenome/MTB-h37rv_asm19595v2-eg18.fa ERR133900_1.fastq.gz ERR133900_2.fastq.gz | samtools sort -@ 50 -o ./2cb81667-3e01-44f0-b3dc-587f096c7d1d.single.bam -
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 100000 sequences (10000000 bp)...
[M::process] read 100000 sequences (10000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (5, 49775, 2, 1)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (205, 244, 301)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (13, 493)
[M::mem_pestat] mean and std.dev: (257.16, 71.47)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 589)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "ERR133900.1.1", "ERR133900.1.2"

The mapping process seems to throw and error and terminates early. Strangely enough this doesn't seem to be picked up and the pipeline continues. I'll try improve the error caching. What is the command you used to download the reads?

aniruddh-1 commented 2 years ago

I am using prefetch and fastq-dump commands from SRA Toolkit. Here are commands in which sra_ids.txt have entries of SRA Accession Numbers, which prefetch command uses it to download those IDs from the EBI server :

prefetch --option file sra_ids.txt                                    # downloads the .sra files from the txt entries
fastq-dump -I --spilt-files ERR133900.sra                             # split files into fastq:  _1.fastq, _2.fastq
gzip ERR133900_1.fastq                                                # zip both files
gzip ERR133900_2.fastq

Also if the files are corrupted then why it ran on Mac system with giving mutations while not in ubuntu and TB-Profiler docker? And also it is strange that how Mykrobe gave the resistance status while not TB-profiler even for this low read files ?

jodyphelan commented 2 years ago

Ok looks like if you drop the -I parameter from the fastq-dump line it will work fine. I'm not sure why it worked for you with on macOS as it gives an identical result for me with both macOS and Linux. Mykrobe works in a very different way and doesn't use bwa or even the mapping approach so it does not encounter this error.

aniruddh-1 commented 2 years ago

Hey @jodyphelan

Thanks for your advice it worked ! Actually on mac the files were manually downloaded from EBI website and it was not downloaded with the SRA toolkit but the one on ubuntu was downloaded with fastq-dump -I command. I ran the processed files(got from without -I command) on TB-Profiler , and it worked!

Thank You

jodyphelan commented 2 years ago

Great! Thanks for reporting the issue. I'll improve the error catching in future versions.