owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
584 stars 62 forks source link

fix : enable vst be used "blindly" and to fit its own dispersion #268

Closed laudmt closed 5 months ago

laudmt commented 6 months ago

Reference Issue or PRs

Fix #263

What does your PR implement? Be specific.

Proposed solution :

BorisMuzellec commented 6 months ago

Thanks for the PR @laudmt!

I think it does the trick for VST, but I have a little concern regarding the fact that providing a fit_type other than None overwrites a DeseqDataSet's trend_fit_type.

Here's an example of problematic / ambiguous behavior that could occur:

dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric")
# Start by computing VST
dds.vst(fit_type="mean")
# Then perform DEA
dds.deseq2() # what trend_fit_type is used to compute the curve ?

In the above example, dds.deseq2() would use trend_fit_type = "mean" (because it was overwritten by vst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.

I think we should find a way to make the choices of curve fit type for vst() and deseq2() completely independent.

laudmt commented 6 months ago

Thanks for the PR @laudmt!

I think it does the trick for VST, but I have a little concern regarding the fact that providing a fit_type other than None overwrites a DeseqDataSet's trend_fit_type.

Here's an example of problematic / ambiguous behavior that could occur:

dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric")
# Start by computing VST
dds.vst(fit_type="mean")
# Then perform DEA
dds.deseq2() # what trend_fit_type is used to compute the curve ?

In the above example, dds.deseq2() would use trend_fit_type = "mean" (because it was overwritten by vst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.

I think we should find a way to make the choices of curve fit type for vst() and deseq2() completely independent.

Indeed thank you, I was missing this use case. I will then store two different fit type one for dea and one for vst and enable self.fit_X methods to take a fit_type in argument.

BorisMuzellec commented 5 months ago

Hi @laudmt, I've pushed a commit in which I implemented the solution I suggested earlier, i.e. setting two separate fit_type and vst_fit_type attributes and duplicating fields (with or without a "vst_" prefix) to avoid data leakage.

Could you have a look when you have time and tell me if that would work for you :) ?

laudmt commented 5 months ago

Thank you @BorisMuzellec I can confirm that this fix the use of pydeseq2 with multiple covariate on my side.