im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
212 stars 48 forks source link

Apply dndscv to a Non-cancer population with high recombination #74

Closed huangl-CAU closed 2 years ago

huangl-CAU commented 2 years ago

Dear Authors, When Apply dndscv to a Non-cancer population with high recombination , I have met several questiones .And I wonder if you could give me some suggestions

Purpose & question

I am re-sequencing a plant inbred line population with thousands of samples,the sequence depth is about 35x, and try to

  1. Identify gene under purify selection or positive selection
  2. figure out the degree of selective pressure for each genes

My question are these

  1. is dndscv a right choice for this analysis ?
  2. what kind of modification should I perform to get better estimation?

Pretreatment

I tried this analysis using a pipeline like that

  1. remove sample by kinship efficiency the avoid too related samples
  2. only keep the rare SNPs(AF<0.01) in my datasets to mitigate the inherent conflict with the "independent mutation events" requirement (I think those common SNPs are little possible arising independently )

Dndscv command & result

Then I run the dndscv with the follow command dndsout = dndscv(mutations, refdb=mydb, max_muts_per_gene_per_sample = Inf, max_coding_muts_per_sample = Inf,outmats=T, cv=NULL) with most genes(33000/37000) are under purify seletion, filtered by "qallsubs_cv<0.05 & wmis_cv<1' The proportion of genes under postive seletion are 2500/37000 , filtered by "qallsubs_cv<0.05 & wmis_cv>1' The proportion of genes under Neutral selection are 2500/37000, filtered by "qallsubs_cv>0.05 '

The nbreg$theta value is 0.688948905558218 The global dn/ds is image The genemuts table looks like that image

Other small question

1: the correlation between observed synonymous SNP num and exp-syn are modest,even for the sample use in tutorial

  cor(dndsout$genemuts$n_syn,dndsout$genemuts$exp_syn)
  [1] 0.4440446

2: can the wmis_cv used as a measure for the degree of selective pressure 3:should I calculate one-sided q-values for negative selection and postive selection independently

Thanks a lot ! HuangL

im3sanger commented 2 years ago

Hi Huang,

Thank you for your interest in dNdScv and please accept my sincere apologies for the very late response.

dNdScv was not designed for the analysis of germline mutations from a highly-recombining population so you will need to be careful and critical with the results and interpretation. In theory, some of the assumptions in dN/dS can be violated when working with polymorphism data (see this paper). However, I believe that the loss of monotonicity between dN/dS ratios and selection coefficients in that paper is expected only under extreme (biologically implausible) levels of recombination, where adjacent synonymous and non-synonymous sites segregate independently (free recombination). Whereas I think that monotonicity will not be a problem in your data (i.e. dN/dS<<1 in your data should be the result of negative selection and dN/dS>>1 the result of positive selection), you need to be careful not to assume that those dN/dS ratios directly enable you to estimate selection coefficients (they don't).

For the analyses that you suggest, I think that dNdScv could work reasonably well. You need to be careful to input unique mutations into dNdScv and avoid counting the same SNP multiple times as independent mutations. To address your other questions:

  1. The correlation between exp and obs synonymous mutations will be limited to some extent by the sparsity of the data. Your moderately low theta value also tells you that there is considerable overdispersion in your data (significant variation of the syn mutation density across genes compared to the expected values). This may be biological (e.g. background selection, true variation of the mutation rate across genes, etc) or technical (variable mutation detection sensitivity across genes).
  2. You can cautiously used wmis_cv as an estimate of the strength of negative selection given the likely monotonicity of dN/dS in the absence of free recombination, but do not make strong assumptions about dN/dS ratios and selection coefficients.
  3. Yes, you can calculate one-sided q-values.

I hope this helps.

Best, Inigo