Open pwaltman opened 5 years ago
Hello,
Thanks for your suggestion. The covariates for hg19 were generated using files from Epigenomics RoadMap. Unfortunately, their files are only available for hg19, as far as I am aware, although there are a few liftovered to GRCh38 that I could use. If there is enough demand from users, I could try generating some covariates (even with the limitions above).
Users can also generate their own covariates, for example using expression data, chromatin data or even coverage metrics from their own experiments. As described in the dNdScv tutorial, covariates can be fed as a numeric matrix with one covariate per column and genes as rownames.
load("RefCDS_human_GRCh38.p12.rda") gene_list = sapply(RefCDS, function(x) x$gene_name) # List of gene names from the GRCh38 object
Also, please note that dndscv can be run on GRCh38 without covariates.
Inigo
Hi Iñigo, Just letting you know that I am one of those users interested in the covariates for the GRCh38 versions. Cheers
Hey Inigo,
Same as above, we find your method very useful and use it quite frequently in our research and so we'd be interested in the covariates for the GRCh38 versions as well. Many thanks
Hi Inigo @im3sanger
First, thanks for the great method. It's super useful and use it for conducting driver analysis for clinical samples, in our research group. We use genome build 38, so at the moment stuck with what covariates to use for the GRCH38 version.
It'll be great help if you can provide covariates for v38.
Best, Sehrish
Hi all,
Thank you for your interest. I will try to create covariates for GRCh38 in the near future as more users are requesting them. In the meantime, please remember that you can still use dndscv on GRCh38 without covariates, using "cv=NULL" as an argument to dndscv and using the reference object from the link below: https://github.com/im3sanger/dndscv_data/tree/master/data
You can also feed dndscv your own covariates, such as expression level or coverage per gene, as described in the tutorial below: http://htmlpreview.github.io/?http://github.com/im3sanger/dndscv/blob/master/vignettes/buildref.html
Please feel free to continue expressing your interest and I will do my best to generate covariates in the near future. I will post a note here as soon as new covariates are available.
Best wishes, Inigo
Hi Inigo, another one here requesting covariates for hg38 please! The tutorial listed at http://htmlpreview.github.io/?http://github.com/im3sanger/dndscv/blob/master/vignettes/buildref.html does not explain how to convert expression/epigenomic data to principal components; perhaps if you could help us with this instead? Either way, much appreciated! A
Hi @im3sanger,
Wondering, if you had a chance to look into generating the covariates for GRCh38?
Best, Sehrish.
Hi All:
Any progress on developing the covariates for hg38? Thanks.
I am also requesting covariates for hg38. Please let us know when I can get that. Thanks.
Hi Inigo, I am also looking forward hg38 covariates. This tool is amazing - please end the era of mutsigCV (which is no longer under maintenance and super-long hours of running).
Thank you everyone for the nudges. We are putting together a set of GRCh38 covariates, which we are hoping to release in the next few weeks. I will update this thread when they are available.
Hi everyone,
Thank you for your patience. I have uploaded a new RefCDS and new covariates for GRCh38/hg38 here: https://github.com/im3sanger/dndscv_data/tree/master/data
The new covariate file is called: covariates_hg19_hg38_epigenome_pcawg.rda.
And they can be used with the following new RefCDS files for GRCh37/hg19 and GRCh38/hg38: RefCDS_human_GRCh38_GencodeV18_recommended.rda RefCDS_human_hg19_GencodeV18_newcovariates.rda
So you can run dndscv on hg38 using:
load("covariates_hg19_hg38_epigenome_pcawg.rda") # Loads the covs object
dndsout = dndscv(mutations, refdb = "RefCDS_human_GRCh38_GencodeV18_recommended.rda", cv = covs)
These covariates were developed by Federico Abascal. Big thanks to him! They were generated combining epigenomic data (from the Roadmap Epigenomics) and whole-genome mutation density vectors (from the PCAWG consortium), collapsed into 20 principal components. They were tested on TCGA data, at a pancancer level and on individual cancer types, and appear to perform generally well.
We may further refine them in the next few weeks and I will try to integrate them in the package by default, so that you can use dndscv on hg38 without downloading additional files. But I wanted to share them with you without further delay.
Please do share any feedback, positive or negative, as we are still testing them and any feedback helps.
Best, Inigo
I realize that this isn't really an issue, but I'm curious if you have given thought to generating a comparable set of covariates for hg38?