jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
104 stars 43 forks source link

Clarification on 'qc' variants #373

Closed vrennie closed 3 months ago

vrennie commented 3 months ago

Hi @jodyphelan ,

We were running tbprofiler on the output of delly and got the following output.

We were wondering what the how the variants in 'qc' fall outside of the three categories:

dr_variants other_variants qc_fail_variants

How should we interpret these 34 qc variants?

Thanks :)

IS6110.S011.results.json

jodyphelan commented 3 months ago

The "total_variants" value relates tot he total number of variants found in the VCF. Not all of these will be eventually processed by tb-profiler. I suspect that the 34 variants are present in the VCF but didn't pass the hard-cutoff filters:

Alternatively, if there are non-deletion variants from delly they will be ignored by the pipeline.

vrennie commented 3 months ago

I don't think that it is the first as this is the command we run to avoid filtering:

tb-profiler profile \ \ --threads 4\ --vcf intermediate.vcf.gz \ --depth 0,0 --af 0,0 --strand 0 --sv_depth 0,0 --sv_af 0,0 --sv_len 100000,50000

The variants look like deletions to me, but maybe I'm missing something obvious... IS6110.S376.delly.bcf.zip

jodyphelan commented 3 months ago

I seem to be picking up one variant with the latest version. Is this the expected output?

tbprofiler.results.json

Just thought of another reason - only variants overlapping with drug resistance genes will be analysed. Do all of the variants in the vcf overlap?

vrennie commented 3 months ago

Ah I see! We are running v6.2.1 not v6.2.2. Is the latest version available in bioconda? @abhi18av do you have any idea why this might be?

Ah I understand! So "other variants" is not necessarily other variants in the genome but only those in the regions with drug resistance genes, makes sense.

Thanks Jody!

jodyphelan commented 3 months ago

Just tested with v6.2.1 - seems to be the same result for me as with the dev version.

tbprofiler.results.json

vrennie commented 3 months ago

Hmm seems we are triggering the standard tbdb, could that be the issue?

id": "IS6110.S011", "timestamp": "2024-06-25T12:34:27.799523", "pipeline": { "software_version": "6.2.1", "db_version": { "name": "tbdb", "commit": "30f8bc3", "Author": "Jody Phelan jody.phelan@lshtm.ac.uk", "Date": "Thu May 30 11:29:07 2024 +0100"

jodyphelan commented 3 months ago

Oh yeah that could be it

vrennie commented 3 months ago

@abhi18av could you have a look at this when you find a moment?

abhi18av commented 3 months ago

@vrennie @jodyphelan , okay so on our side we're using the tbdb repository yes, but our own MAGMA branch (notice the commit hash **30f8bc3**) - therefore we we'd need to maybe, tweak the database accordingly

image
vrennie commented 3 months ago

@jodyphelan @abhi18av I'm still confused as to why we would not be picking up that variant. Also concerning is that the json that Jody gave has 80 variants while ours only has 34. thoughts?

jodyphelan commented 3 months ago

It might be a chromosome naming thing? If you change the chromosome name from 'Chromosome' to 'NC-000962-3-H37Rv' on the MAGMA branch this should solve some of these issues.

Also concerning is that the json that Jody gave has 80 variants while ours only has 34. thoughts?

Not sure on this - this should reflect the number of variants in the VCF.

vrennie commented 3 months ago

@mdediegofuertes did you try this already?

@jodyphelan I assume you an analysed IS6110.S376.delly.bcf.zip, correct?

abhi18av commented 3 months ago

It might be a chromosome naming thing? If you change the chromosome name from 'Chromosome' to 'NC-000962-3-H37Rv' on the MAGMA branch this should solve some of these issues.

Mmm, @jodyphelan ,good point. We do temporarily change the name for the chromosome within the pipeline itself (to stay as close to your setup as possible)

https://github.com/TORCH-Consortium/MAGMA/blob/f2817f8c745ff4efe9fa7e3bc815d92902244105/modules/tbprofiler/vcf_profile__lofreq.nf#L19

But perhaps a better context would be which version of the pipeline (and therefore, tbprofiler) was used to generate the initial result @vrennie, just to double check that this is not from a previous version where we didn't have that snippet?

vrennie commented 3 months ago

@abhi18av it was this run, so should be the most recent:

https://cloud.seqera.io/orgs/TORCH/workspaces/XBS-Nextflow/watch/3lWc8oYvn4duaI

vrennie commented 3 months ago

@jodyphelan @abhi18av apologies I realise what the issue is, I sent a delly.bcf of one sample and a tbprofiler json of a different sample.

Apologies for the oversight. Thanks for engaging with this non-issue nonetheless!

jodyphelan commented 3 months ago

No worries! 😁