Closed vrennie closed 3 months ago
The "total_variants" value relates tot he total number of variants found in the VCF. Not all of these will be eventually processed by tb-profiler. I suspect that the 34 variants are present in the VCF but didn't pass the hard-cutoff filters:
50000 size of SV
Alternatively, if there are non-deletion variants from delly they will be ignored by the pipeline.
I don't think that it is the first as this is the command we run to avoid filtering:
tb-profiler profile \ \ --threads 4\ --vcf intermediate.vcf.gz \ --depth 0,0 --af 0,0 --strand 0 --sv_depth 0,0 --sv_af 0,0 --sv_len 100000,50000
The variants look like deletions to me, but maybe I'm missing something obvious... IS6110.S376.delly.bcf.zip
I seem to be picking up one variant with the latest version. Is this the expected output?
Just thought of another reason - only variants overlapping with drug resistance genes will be analysed. Do all of the variants in the vcf overlap?
Ah I see! We are running v6.2.1 not v6.2.2. Is the latest version available in bioconda? @abhi18av do you have any idea why this might be?
Ah I understand! So "other variants" is not necessarily other variants in the genome but only those in the regions with drug resistance genes, makes sense.
Thanks Jody!
Just tested with v6.2.1 - seems to be the same result for me as with the dev version.
Hmm seems we are triggering the standard tbdb, could that be the issue?
id": "IS6110.S011", "timestamp": "2024-06-25T12:34:27.799523", "pipeline": { "software_version": "6.2.1", "db_version": { "name": "tbdb", "commit": "30f8bc3", "Author": "Jody Phelan jody.phelan@lshtm.ac.uk", "Date": "Thu May 30 11:29:07 2024 +0100"
Oh yeah that could be it
@abhi18av could you have a look at this when you find a moment?
@vrennie @jodyphelan , okay so on our side we're using the tbdb
repository yes, but our own MAGMA
branch (notice the commit hash **30f8bc3**
) - therefore we we'd need to maybe, tweak the database accordingly
@jodyphelan @abhi18av I'm still confused as to why we would not be picking up that variant. Also concerning is that the json that Jody gave has 80 variants while ours only has 34. thoughts?
It might be a chromosome naming thing? If you change the chromosome name from 'Chromosome' to 'NC-000962-3-H37Rv' on the MAGMA branch this should solve some of these issues.
Also concerning is that the json that Jody gave has 80 variants while ours only has 34. thoughts?
Not sure on this - this should reflect the number of variants in the VCF.
@mdediegofuertes did you try this already?
@jodyphelan I assume you an analysed IS6110.S376.delly.bcf.zip, correct?
It might be a chromosome naming thing? If you change the chromosome name from 'Chromosome' to 'NC-000962-3-H37Rv' on the MAGMA branch this should solve some of these issues.
Mmm, @jodyphelan ,good point. We do temporarily change the name for the chromosome within the pipeline itself (to stay as close to your setup as possible)
But perhaps a better context would be which version of the pipeline (and therefore, tbprofiler) was used to generate the initial result @vrennie, just to double check that this is not from a previous version where we didn't have that snippet?
@abhi18av it was this run, so should be the most recent:
https://cloud.seqera.io/orgs/TORCH/workspaces/XBS-Nextflow/watch/3lWc8oYvn4duaI
@jodyphelan @abhi18av apologies I realise what the issue is, I sent a delly.bcf of one sample and a tbprofiler json of a different sample.
Apologies for the oversight. Thanks for engaging with this non-issue nonetheless!
No worries! 😁
Hi @jodyphelan ,
We were running tbprofiler on the output of delly and got the following output.
We were wondering what the how the variants in 'qc' fall outside of the three categories:
dr_variants other_variants qc_fail_variants
How should we interpret these 34 qc variants?
Thanks :)
IS6110.S011.results.json