I am working a project where we have done targeted Duplex Sequencing of C. elegans mitochondrial DNA after a mutation accumulation line experiment. I'm working with Scott Kennedy, and he mentioned using dndscv to look at dN/dS. I have it working (I am happy to provide you the C. elegans mitochondrial reference genome RefCDS object).
However, something that Scott and I are concerned about is that I have somewhat non-uniform sequencing coverage across the genome. I have attached an example from one library. Though on average we have high depth (~5,000X/sample), you can see that it is somewhat variable. I attached a second file where I have plotted the SNVs (in Geneious Prime), and it appears that there are in fact a couple coding regions that have no mutations where coverage was very low.
Scott and I would be really interested to hear if you think there is a way to normalize dN/dS rates to the variable coverage.
Please don't hesitate to reach me at tcl21@duke.edu, though I'm sure other folks might be interested in this issue too.
Thanks! -Tess
Although we are communicating by email, I thought I would provide an answer here for other users.
Briefly, there are two analyses to consider:
Gene-wise dN/dS: If you are concerned about coverage variation across genes, you can input the mean coverage per gene as a covariate (the format is briefly described in the tutorial). This will be used to normalise the expected mutation rate of the gene (you should see that the last column of the genemuts table is then closer to the number of synonymous mutations in the gene; that is, “exp_syn_cv” should be closer to “n_syn” than “exp_syn”). You can still use other covariates by adding coverage as an additional column in the default covariates (just be aware of the maximum number of covariates used by dNdScv using the default arguments, see ? dndscv for more details).
Global dN/dS ratios: Using covariates will not affect the global dN/dS ratio, which could in theory still be affected by coverage biases, although I would expect these biases to be small. Since you have a very dense dataset, if you want to demonstrate that coverage biases do not affect global dN/dS ratios as a supplementary analysis, you could downsample your mutation calls to simulate a uniform coverage across the genome (e.g. you can downsample your data to 1000x by using binomial sampling based on the observed VAF of each mutation, comparable to sampling 1,000 molecules at each site). However, it is theoretically possible to weight genes by coverage in the global dN/dS calculation using the L and N matrices calculated by dndscv (you can see these matrices using the outmats=T argument in ?dndscv). Assuming that coverage is linearly related to mutations would not be valid for standard sequencing but may be valid for duplex sequencing with polyclonal samples. This would require a small amount of coding to run the Poisson regression outside of dndscv. Still, I would expect that coverage variation should have a small effect on global dN/dS ratios in most cases.
Hello Inigo,
I am working a project where we have done targeted Duplex Sequencing of C. elegans mitochondrial DNA after a mutation accumulation line experiment. I'm working with Scott Kennedy, and he mentioned using dndscv to look at dN/dS. I have it working (I am happy to provide you the C. elegans mitochondrial reference genome RefCDS object).
However, something that Scott and I are concerned about is that I have somewhat non-uniform sequencing coverage across the genome. I have attached an example from one library. Though on average we have high depth (~5,000X/sample), you can see that it is somewhat variable. I attached a second file where I have plotted the SNVs (in Geneious Prime), and it appears that there are in fact a couple coding regions that have no mutations where coverage was very low.
Scott and I would be really interested to hear if you think there is a way to normalize dN/dS rates to the variable coverage. Please don't hesitate to reach me at tcl21@duke.edu, though I'm sure other folks might be interested in this issue too. Thanks! -Tess
N2_Geneious_graph.pdf