im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
202 stars 47 forks source link

any plans to generate covariates for hg38? #30

Open pwaltman opened 5 years ago

pwaltman commented 5 years ago

I realize that this isn't really an issue, but I'm curious if you have given thought to generating a comparable set of covariates for hg38?

im3sanger commented 5 years ago

Hello,

Thanks for your suggestion. The covariates for hg19 were generated using files from Epigenomics RoadMap. Unfortunately, their files are only available for hg19, as far as I am aware, although there are a few liftovered to GRCh38 that I could use. If there is enough demand from users, I could try generating some covariates (even with the limitions above).

Users can also generate their own covariates, for example using expression data, chromatin data or even coverage metrics from their own experiments. As described in the dNdScv tutorial, covariates can be fed as a numeric matrix with one covariate per column and genes as rownames.

load("RefCDS_human_GRCh38.p12.rda") gene_list = sapply(RefCDS, function(x) x$gene_name) # List of gene names from the GRCh38 object

Also, please note that dndscv can be run on GRCh38 without covariates.

Inigo

carladosanjos commented 4 years ago

Hi Iñigo, Just letting you know that I am one of those users interested in the covariates for the GRCh38 versions. Cheers

JMarzec commented 3 years ago

Hey Inigo,

Same as above, we find your method very useful and use it quite frequently in our research and so we'd be interested in the covariates for the GRCh38 versions as well. Many thanks

skanwal commented 3 years ago

Hi Inigo @im3sanger

First, thanks for the great method. It's super useful and use it for conducting driver analysis for clinical samples, in our research group. We use genome build 38, so at the moment stuck with what covariates to use for the GRCH38 version.

It'll be great help if you can provide covariates for v38.

Best, Sehrish

im3sanger commented 3 years ago

Hi all,

Thank you for your interest. I will try to create covariates for GRCh38 in the near future as more users are requesting them. In the meantime, please remember that you can still use dndscv on GRCh38 without covariates, using "cv=NULL" as an argument to dndscv and using the reference object from the link below: https://github.com/im3sanger/dndscv_data/tree/master/data

You can also feed dndscv your own covariates, such as expression level or coverage per gene, as described in the tutorial below: http://htmlpreview.github.io/?http://github.com/im3sanger/dndscv/blob/master/vignettes/buildref.html

Please feel free to continue expressing your interest and I will do my best to generate covariates in the near future. I will post a note here as soon as new covariates are available.

Best wishes, Inigo

alhafidzhamdan commented 3 years ago

Hi Inigo, another one here requesting covariates for hg38 please! The tutorial listed at http://htmlpreview.github.io/?http://github.com/im3sanger/dndscv/blob/master/vignettes/buildref.html does not explain how to convert expression/epigenomic data to principal components; perhaps if you could help us with this instead? Either way, much appreciated! A

skanwal commented 3 years ago

Hi @im3sanger,

Wondering, if you had a chance to look into generating the covariates for GRCh38?

Best, Sehrish.

xtmgah commented 3 years ago

Hi All:

Any progress on developing the covariates for hg38? Thanks.

vivekruhela commented 3 years ago

I am also requesting covariates for hg38. Please let us know when I can get that. Thanks.

joonan30 commented 2 years ago

Hi Inigo, I am also looking forward hg38 covariates. This tool is amazing - please end the era of mutsigCV (which is no longer under maintenance and super-long hours of running).

im3sanger commented 2 years ago

Thank you everyone for the nudges. We are putting together a set of GRCh38 covariates, which we are hoping to release in the next few weeks. I will update this thread when they are available.

im3sanger commented 2 years ago

Hi everyone,

Thank you for your patience. I have uploaded a new RefCDS and new covariates for GRCh38/hg38 here: https://github.com/im3sanger/dndscv_data/tree/master/data

The new covariate file is called: covariates_hg19_hg38_epigenome_pcawg.rda.

And they can be used with the following new RefCDS files for GRCh37/hg19 and GRCh38/hg38: RefCDS_human_GRCh38_GencodeV18_recommended.rda RefCDS_human_hg19_GencodeV18_newcovariates.rda

So you can run dndscv on hg38 using:

load("covariates_hg19_hg38_epigenome_pcawg.rda") # Loads the covs object
dndsout = dndscv(mutations, refdb = "RefCDS_human_GRCh38_GencodeV18_recommended.rda", cv = covs)

These covariates were developed by Federico Abascal. Big thanks to him! They were generated combining epigenomic data (from the Roadmap Epigenomics) and whole-genome mutation density vectors (from the PCAWG consortium), collapsed into 20 principal components. They were tested on TCGA data, at a pancancer level and on individual cancer types, and appear to perform generally well.

We may further refine them in the next few weeks and I will try to integrate them in the package by default, so that you can use dndscv on hg38 without downloading additional files. But I wanted to share them with you without further delay.

Please do share any feedback, positive or negative, as we are still testing them and any feedback helps.

Best, Inigo