owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
584 stars 62 forks source link

Support for arbitratry design matrices and contrast vectors #213

Open grst opened 11 months ago

grst commented 11 months ago

Is your feature request related to a problem? Please describe. Most linear models support passing designs as design matrices and contrasts as contrast vectors. This is the "smallest common denominator" for specifying designs and it's useful

Describe the solution you'd like

Additional context discussed on the scverse hackathon in Cambridge

CC @const-ae @emdann

BorisMuzellec commented 11 months ago

Hi @grst @const-ae @emdann, is there a consensus regarding what would be most convenient? I'm assuming we want to use formulaic?

I won't have the bandwidth to implement this feature on my own in the next few weeks, but if anyone wants to give it a try, I'm happing to help them.

grst commented 11 months ago

I don't even think you'd need to deal with formulaic/patsy in PyDESeq2, at least initially. Either tool generates a design matrix (which advanced users could also create manually) which should be the input for PyDESeq2.

const-ae commented 11 months ago

I agree with Gregor that the easiest change might be to simply allow some way to provide a design matrix and then just skip the step build_design_matrix at https://github.com/owkin/PyDESeq2/blob/main/pydeseq2/dds.py#L249. Of course, longer term I think it would be great to save the user from converting data + formula to a design matrix and do it internally, but in the end it's just syntactic sugar :)

jeandut commented 6 months ago

The PR #181 is implementing the ability to give a design matrix directly however for now it needs to follow pydeseq2 naming conventions for further preprocessing namely the _vs_ syntax.

jeandut commented 6 months ago

Don't hesitate to play with the branch and give feedbacks on limitations.

grst commented 22 hours ago

for now it needs to follow pydeseq2 naming conventions for further preprocessing namely the vs syntax

does that mean if it doesn't follow the naming conventions it doesn't work at all, or would I just have to specify contrasts manually?

jeandut commented 21 hours ago

for now it needs to follow pydeseq2 naming conventions for further preprocessing namely the vs syntax

does that mean if it doesn't follow the naming conventions it doesn't work at all, or would I just have to specify contrasts manually?

The way that it is done in this PR is that to extract the design_factors from a user-given design_matrix it assumes interactions are given following pydeseq2 naming conventions. This processing is fairly straightforward and can be inspected in pydeseq2/utils.py at the end of the process_design_factors function (if design_matrix is not None).

We realize that the current situation is not optimal and are trying actively to find the best trade-off between the coverage of deseq2/formulaic functionalities we support and merging this PR "quickly" (sorry that it already took so many times) given the very limited bandwidth we currently have.