owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
573 stars 60 forks source link

[BUG] UFunc Casting Error when Fitting Dispersions #275

Closed zaki-bds closed 4 months ago

zaki-bds commented 4 months ago

Describe the bug

I'm getting the following exception when running the following code, when I googled this, it seemed to be related to a numpy update. I currently have numpy 1.23.1

dds.deseq2()

dds.LFCs

stat_res = DeseqStats.DeseqStats(dds, n_cpus=8)

stat_res.summary()

To Reproduce

dds = DeseqDataSet.DeseqDataSet(counts = counts_on_off.transpose(),
                                clinical = metadata.set_index('ID'),
                                design_factors = "condition",
                                refit_cooks=False)

The above exception was the direct cause of the following exception:

UFuncTypeError                            Traceback (most recent call last)
Input In [53], in <cell line: 9>()
      2 counts_on_off.describe()
      4 dds = DeseqDataSet.DeseqDataSet(counts = counts_on_off.transpose(),
      5                                 clinical = metadata.set_index('ID'),
      6                                 design_factors = "condition",
      7                                 refit_cooks=False)
----> 9 dds.deseq2()
     11 dds.LFCs
     13 stat_res = DeseqStats.DeseqStats(dds, n_cpus=8)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydeseq2/DeseqDataSet.py:224, in DeseqDataSet.deseq2(self)
    222 self.fit_size_factors()
    223 # Fit an independent negative binomial model per gene
--> 224 self.fit_genewise_dispersions()
    225 # Fit a parameterized trend curve for dispersions, of the form
    226 # math:: f(\mu) = \alpha_1/\mu + a_0
    227 self.fit_dispersion_trend()

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydeseq2/DeseqDataSet.py:330, in DeseqDataSet.fit_genewise_dispersions(self)
    328 start = time.time()
    329 with parallel_backend("loky", inner_max_num_threads=1):
--> 330     res = Parallel(n_jobs=self.n_processes, batch_size=self.batch_size)(
    331         delayed(fit_alpha_mle)(
    332             counts=counts_nonzero[:, i],
    333             design_matrix=X,
    334             mu=mu_hat_[i, :],
    335             alpha_hat=rough_disps[i],
    336             min_disp=self.min_disp,
    337             max_disp=self.max_disp,
    338         )
    339         for i in range(num_genes)
    340     )
    341 end = time.time()
    342 print(f"... done in {end - start:.2f} seconds.\n")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/joblib/parallel.py:1098, in Parallel.__call__(self, iterable)
   1095     self._iterating = False
   1097 with self._backend.retrieval_context():
-> 1098     self.retrieve()
   1099 # Make sure that we get a last message telling us we are done
   1100 elapsed_time = time.time() - self._start_time

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self)
    973 try:
    974     if getattr(self._backend, 'supports_timeout', False):
--> 975         self._output.extend(job.get(timeout=self.timeout))
    976     else:
    977         self._output.extend(job.get())

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/joblib/_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout)
    564 """Wrapper for Future.result to implement the same behaviour as
    565 AsyncResults.get from multiprocessing."""
    566 try:
--> 567     return future.result(timeout=timeout)
    568 except CfTimeoutError as e:
    569     raise TimeoutError from e

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py:446, in Future.result(self, timeout)
    444     raise CancelledError()
    445 elif self._state == FINISHED:
--> 446     return self.__get_result()
    447 else:
    448     raise TimeoutError()

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py:391, in Future.__get_result(self)
    389 if self._exception:
    390     try:
--> 391         raise self._exception
    392     finally:
    393         # Break a reference cycle with the exception in self._exception
    394         self = None

UFuncTypeError: Cannot cast ufunc 'slogdet' input from dtype('O') to dtype('float64') with casting rule 'same_kind'
BorisMuzellec commented 4 months ago

Hi @zaki-bds, it seems like your counts DataFrame doesn't have a numeric type. Can you try casting your counts like this?

dds = DeseqDataSet.DeseqDataSet(counts = counts_on_off.astype("float64").transpose(), # change this line
                                clinical = metadata.set_index('ID'),
                                design_factors = "condition",
                                refit_cooks=False)

dds.deseq2()
zaki-bds commented 4 months ago

Hi Boris!

Thank you so much for your prompt response. I tried that and it gave me the exact same error. I had also tried ".astype(int)" and that did not solve it. As well as dropping all rows with all zero counts.

I also tried rerunning one of my older workbooks that ran pydeseq2 perfectly and it gave me the same error.

However, downgrading my pandas from v2.2.0 to v.1.4.3 (as stated in your repository readme) seemed to solve the issue. So it seems like the issue arose with a newer pandas release. I may or may not be missing some packages that the newer version of pandas are dependent on, but I will look into that later. For now this stopgap solution worked for me. If I am able to run pydeseq2 with the updated version of pandas I will update here. Thank you!

BorisMuzellec commented 4 months ago

It's great that you were able to run your analysis @zaki-bds!

I'd like to understand and fix the root cause though, because in principle you shouldn't have to downgrade pandas to be able to run pydeseq2, especially since we already handled the transition to pandas 2.0...

Could you check which version of pydeseq2 you are using?

(

import pydeseq2
pydeseq2.__version__

)

zaki-bds commented 4 months ago

Of course! Thank you so much.

My current pydeseq2 version is 0.2.1. I downgraded pandas because of the package dependencies listed on the GitHub readme page but for some reason it totally crossed my mind to try and upgrade the pydeseq2 version as well. Upgrading both pydeseq2 to the latest stable version on pip (0.4.9) and pandas (2.2.2) and everything runs smoothly!

Thanks again!