nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
384 stars 400 forks source link

[FEATURE] Tumor mutational burden (TMB) scoring #495

Open apeltzer opened 2 years ago

apeltzer commented 2 years ago

nf-core/sarek feature request

Is your feature request related to a problem? Please describe

TMB / tumor mutational burden is a key metric for many oncology related projects and thus could be a feature requested by many users of the pipeline at some point to be computed similar to MSI status (which is already done in the pipeline).

Describe the solution you'd like

Have some method / way of computing TMB status and reporting this to the user in the main MultiQC table ideally.

Describe alternatives you've considered

image

apeltzer commented 2 years ago

Easy to use feature would be this here: https://github.com/bioinfo-pf-curie/TMB

apeltzer commented 2 years ago

A good explanation how to do the job potentially as this evaluates several panels + WES that could be of interest: https://www.nature.com/articles/s41598-020-68394-4

apeltzer commented 2 years ago

potentially worth looking at too: https://github.com/bcgsc/TMBur

apeltzer commented 2 years ago

Also a good tool apparently, estimating TMB: https://github.com/bioinform/ecTMB

apeltzer commented 2 years ago

Strong tendency towards using the curie approach because:

All these points above are sometimes hard to achieve if you have to do them yourself --> good idea to rely on this instead 👍🏻

maxulysse commented 2 years ago
* straightforward output for multiqc (tsv tables are produced automatically --> could just report this in separate table in MultiQC report in Sarek) heavy_check_mark

MultiQC module possibility even?

FriederikeHanssen commented 2 years ago

Hey! I started looking into this. I actually can't find the curie tool on anaconda.org. From the README I think, they provide an environment file to set up, in which then the python scripts can be run. then also no biocontainers would exists. Did you find anything else @apeltzer ?

apeltzer commented 2 years ago

No, nothing else - I suspect that they have their own conda channel where they provide everything this uses / needs.

nservant commented 2 years ago

Hi there, just be in touch with @FriederikeHanssen ! So indeed, there is no conda repo, just because I'm not familiar with making such package :) But we'll be happy to do it of course !

apeltzer commented 2 years ago

Hey everyone, I've just made a bioconda recipe for the curie tmb tool --> see here: https://github.com/bioconda/bioconda-recipes/pull/35393

@nservant you could also link to that recipe in the future, as it would allow people to directly install the tool without having to setup a separate environment, e.g.:

mamba install -c bioconda -c conda-forge tmb=1.3.0

Running the tools then works with:

pyTMB.py <parameters (and the genome size script works the same way)

FriederikeHanssen commented 2 years ago

❤️ thank you for adding this! As soon as biocontainers is there, I'll add the module.

FriederikeHanssen commented 2 years ago

@apeltzer started working on it. Module is in the making. For adding it to sarek, we need to add config files for each caller (except mutec2). and then also discuss on how to best make them available. It will take me a bit longer then I hoped.

tomgutman commented 2 years ago

Hello @FriederikeHanssen @apeltzer,

I am the co-developper of pyTMB, I'm working with Nicolas Servant.

Thanks a lot for your help for bioconda ! I was struggling with the build.sh part.

Let me know If I can help for the config files or for testing

FriederikeHanssen commented 1 year ago

Hello @FriederikeHanssen @apeltzer,

I am the co-developper of pyTMB, I'm working with Nicolas Servant.

Thanks a lot for your help for bioconda ! I was struggling with the build.sh part.

Let me know If I can help for the config files or for testing

Hi @tomgutman !

This PR has been a long time in the making due to me no t having time, but also because creating the yml files seems very daunting. Do you already have some for VEP or any of the variantcallers that we use, such as strelka, and freebayes?

Raghu9721 commented 9 months ago

Hello @apeltzer @FriederikeHanssen

This is Raghavendra, I am currently using pyTMB.py for finding Tumor mutational burden with having VEP annotated vcf output file. But I am unable to use pyTMB.py due to non-availability of specific config.yml file for VEP. Please help me in creating it.

FriederikeHanssen commented 7 months ago

Hi @Raghu9721 ! Sorry just saw this, I am not the pyTMB tool developer. I would ask here to get help from them for this: https://github.com/bioinfo-pf-curie/TMB

nservant commented 7 months ago

Hi @Raghu9721, @FriederikeHanssen , Happy to help on that question, but I never used VEP before. The only thing I would need is a VCF file annotated with VEP to see how the different fields can be parsed.

jbague commented 4 months ago

Hi, I am using Sarek v.3.4.1 to extract VEP annotations from my samples. My question is if threre is any consensous method to calculate TMB. Is it the tool generated by Institut Curie - TMB analysis the most used? What's your recommendation? Really thanks!!!

bounlu commented 1 month ago

I am curious to see whether there is any progress to add TMB to the sarek pipeline.

maxulysse commented 1 month ago

I'm afraid progress on that side has been very low