Closed HendrinaS closed 9 months ago
Hi Hendrina,
By default, dNdScv assumes that your data is mapped to GRCh37. If your mutations are in hg38, you will need to use the optional arguments in dndscv to use the correct databases and covariates. Please follow these steps:
load("covariates_hg19_hg38_epigenome_pcawg.rda") # Loads the covs object
dndsout = dndscv(mutations, refdb = "RefCDS_human_GRCh38_GencodeV18_recommended.rda", cv = covs)
For more information on how to run dNdScv on other species or newer assemblies you can also see this tutorial. But for hg38, the instructions above should be sufficient.
You also ask about removing duplicates. You can use: mutations = unique(mutations), to remove duplicated rows in your data. The warning issued by dndscv tells you that two identical mutations were found in different sampleIDs. If these are genuinely independent mutations that occurred in two different patients, you can safely ignore this warning. But if your sampleIDs represent two biopsies from the same donor, it is likely that two identical mutations in the two biopsies represent the same clone. In that case, it is preferable to collapse identical mutations per donor, which you can do changing sampleIDs to represent donorIDs and then use the "unique" function above.
Best, Inigo
Good day,
Could you please assist me with the following;
I’m using the dNdScv package, initially, I run the script and I got the following error messages,
I then changed the chr numbers style ( chr1 to 1, chr2 to 2, chr3 to 3…)
Then I ran the script again, however, I am getting extra error messages, as below.
Can the dNdScv package use data mined with hg38 instead of hg37, is there a script I can use when using hg38 sequenced data? Is there a script to removed duplicated mutations in my file or I will to do this manually, (my file is big)?
Thank you for your assistance.
Hendrina