owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
583 stars 61 forks source link

Feature Request: RLE Plots #320

Open jonathjd opened 3 weeks ago

jonathjd commented 3 weeks ago

Is your feature request related to a problem? Please describe.

Right now there is no method that plots an RLE plot even though the DeseqDataSet method has both the normalized counts (DeseqDataSet.layers["norm_counts"]) and estimated size factors (DeseqDataSet.obsm["size_factors"]). RLE plots are useful for identifying technical variation and normalization issues in RNA-Seq data, as well as testing out different normalization strategies. These plots are super helpful in assessing and visualizing unwanted technical noise or batch effects in expression data.

Describe the solution you'd like I propose implementing an RLE plot method, similar to the plotRLE function in EDASeq for R. The plot would display boxplots of the relative log expression of genes for each sample, centered around the median per gene. The implementation would:

Describe alternatives you've considered

The implementation in other standard libraries (pandas, numpy, matplotlib) is not too bad, but it would be super handy if it came standard!

Additional context Here's an example of an RLE plot I've made that could be similar. Thanks!!

Screenshot 2024-09-27 at 4 38 23 PM
BorisMuzellec commented 3 weeks ago

Hi @jonathjd, great suggestion!

I probably won't be able to implement this myself, but I'd love to help you open a PR.

This could be a method of the DeseqDataSet class wrapping a function in utils.py, similarly to plot_dispersions in DeseqDataSet which calls make_scatter from utils.

Let me know if you need any help :).

jonathjd commented 2 weeks ago

@BorisMuzellec That sounds like a great idea, and I would be happy to work on it! Let me know what I need to do to make it happen.