yolandalago / CIRCUST

1 stars 1 forks source link

Rhythmicity statistic? #4

Open lvclark opened 7 months ago

lvclark commented 7 months ago

Hi again. My collaborator and I noticed that the top rhythmic genes are output, but we aren't sure how to get the statistics for all genes that are analyzed. In particular, we want to be able to see whatever statistic was used for determining if genes are rhythmic or not, so that we can rank all genes by their rhythmicity. Is that statistic somewhere in the output? Maybe in obtainNPRefG?

yolandalago commented 7 months ago

The statistic employed to determine rhythmicity is R^2. It is a goodness of fit measure detailed on page 17 of the Supporting Information (see Larriba et al. 2023). Given a rhythmicity model (ORI, FMM), the R^2 value ranges between 0 and 1 and can be interpreted as the proportion of variability the fitted model explains. In our code, it has been used for several purposes. First, the function computeNP calculates R^2 according to the ORI model to discard potentially non-rhythmic genes (R^2_ORI<0.5). The remaining (obtainNPRefG[[5]]) are used to select TOP rhythmic genes. Then, the list of tissue-specific TOP rhythmic genes is defined, based on the FMM model predictions, as those which are: i) non-spiked (w>0.1); ii) with the highest rhythmicity (R^2_FMM>0.5); and iii) whose peak phases (t_U) cover all the quarters of the unit circle ([0, 2π)). Next, k random selections of size 2/3 of the genes in the TOP are considered. For each of them, the CPCA solution for temporal order estimates is recomputed for each of these sub-matrices. Hence, k replications (with k different orders) of TOP genes are obtained. Finally, for each of the genes in the TOP you can fit FMM and obtain the median R^2_FMM across orders (replications). These values are provided as output in our atlas.