simonlabcode / bakR

Other
6 stars 2 forks source link

No significant differentially-stabilized genes? #5

Closed lynnhlee closed 7 months ago

lynnhlee commented 7 months ago

Hi Isaac- First, bakR and the bam2bakr workflow are great pieces of software and really well-documented; thanks for putting these out there.

I am curious if I am making an obvious mistake in preprocessing my data such that there are no statistically significant differentially-stabilized genes? My experiment has two biological replicates plus a pair of no-4sU controls (much like the bakR vignette) - I started from fastqs through the bam2bakr workflow (using STAR as my aligner of choice), then used the cB csv as input along with an appropriately formatted dataframe with experimental metadata. All the QC looked okay - good correlation (0.92,) average fraction new was ~0.5-0.6. Using both the efficient and hybrid model, however, I get no statistically significant hits. I attached an MA plot; there are some genes with fairly large L2FC(Kdeg) and the raw p is significant, but they fail with FDR testing it seems. Am I missing something obvious, or is there just no signal in my data?

bakR_MAplot

Second minor question - after running bam2bakR, the track files generated for IGV look inverted? (ie genes on the sense strand show up on minus track, and vice versa.) I am pretty sure I have the config file correct - the library was prepared according to your lab's Timelapse-seq protocol, so I have the strandedness set to 'R' (which is consistent with the Takara kit; read 1 is antisense to the input RNA).

thanks, -Lynn

isaacvock commented 7 months ago

Hi Lynn,

Thank you for your kind words and your interest in bakR/bam2bakR!

It sounds like you have very high quality data, the only data-quality question I have is what your inferred new read mutation rates were? If this is low (< 5%, which can be caused by limited incorporation of s4U by the cell line used), that could limit bakR's ability to be confident in any apparent signal.

The MA plot that you show is a bit striking in that almost all of the L2FC(kdegs) are between -1 and 1, and the vast majority of the data lies between -0.5 and 0.5. This does suggest limited signal in your dataset, as these effect sizes are very small and consistent with the amount of variability expected for null comparisons. To calibrate your expectations for what large effect sizes in these experiments look like you could compare this to the MA plot we show in Figure 6 of our paper where we analyzed the effect of DCP2 KO (also 2 replicates), and L2FC(kdegs) get as low as -4.

I am also wondering if when you perform differential expression analysis if you see any significant expression changes? It's totally possible to have no expression changes but real stability changes (we even discuss an example of this in the bakR paper), and it is also possible to see changes in expression without changes in stability (i.e., transcriptional regulation) but a lack of differential expression would lead some further credence to the hypothesis that there are limited changes in RNA dynamics occurring in your system.

It is possible that some of these small effect sizes are real differences, and your present experiment + bakR is just too underpowered to be confident in these effects; e.g., collecting more replicates could in theory allow you to identify some of these differences. Depending on the biological phenomenon you are studying, something that can be worthwhile to generate useful hypotheses in this case (without collecting more replicates) is to do gene set enrichment analysis on the top N stability differences, just to see if any of the small but possibly real differences are enriched for genes from particular biological pathways of interest. N is some integer that I am not explicitly specifying because you could try multiple different cutoffs.

To your second question, this is a known quirk in bam2bakR when using a reverse stranded library. The tracks will be "flipped" for all genes. While I have been working to eventually fix this, it should not affect the overall interpretation of the tracks. Just remember that what appears "sense" relative to an annotation is antisense, and vice versa. I can rush a patch to this if you are in a situation where this track inversion is causing serious analysis problems.

Best, Isaac

lynnhlee commented 7 months ago

Hi Isaac- Thanks for the detailed reply. The mutation rates looked pretty reasonable - 6.5 to 7%, so that seems unlikely to be the problem. I looked at differential expression with DESeq2 as you suggested, and there are no significant DEGs either. It is (frustratingly) looking like there just aren't that many perturbations with our experimental setup. Without being too vague, we have a degron tagged RNA stabilizer which we degrade for 12 hours, then label, extract RNA etc. I would have thought that 12h (plus 4h label) would be a decent amount of time for particularly short-lived RNAs to drop out, but it seems we need to go back to the drawing board.

No worries about the track flipping issue; I don't need it patched right now. Just wanted to make sure it was expected and not operator error on my end.

thanks again for the help, -Lynn