im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
212 stars 48 forks source link

Help with trying to make a pentanucleotide substitution model #36

Closed arvindjs closed 5 years ago

arvindjs commented 5 years ago

Hi, I'm trying to make a penta nucleotide substitution model to use with the dndscv package. Is it as simple as making a matrix with 3072x4 ,or does something else need to be done.(Build Refcds objects,etc) All the headers for the rest of the models have the trinucleotide substitution as the id(key).

Also some odd obsevations: TGT>TGA: in 192 rate model has only txwnon,..etc while rest have the entire trinucleotide change. similarly, in the 12 rate model all T>G has only tXwnon For clarity see the highlighted text in the screengrabs. 1 2 3

Is it possible to include pentanucleotide model(mentioned in the paper) in upcoming releases or clarify on how one could create such a model and whether new reference objects will need to be built.

Thanks, Arvind

im3sanger commented 5 years ago

Hi Arvind,

I am afraid, at the moment dndscv only supports substitution models contained within a trinucleotide context (including simpler ones). I may include a function to calculate global dN/dS ratios using a pentanucleotide model at some point, since this is simpler to implement, but I am not currently planning to release a function to run the dNdScv negative regression model for gene-level selection using pentanucleotide contexts. While this is doable, it would require considerable changes in the way information is stored in the RefCDS objects. I would be interested to know if there is demand for this feature from multiple users, but for most applications a pentanucleotide model at the gene level is probably unnecessary.

Regarding your question about TGT>TGA or T>G not having a rate parameter in the substitution model... This is because rate parameters are relative to the "t" parameter and so one of the rate parameters is unnecessary (fixed to 1 and so absent from the table). This is a common practice in traditional dN/dS models and does not affect the results.

Best, Inigo