owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
584 stars 62 forks source link

Bug Report: 'n_cpus' and 'n_processes' Keywords Not Recognized in pydeseq2.dds.DeseqDataSet (Version 0.4.4) #241

Closed chloesavignac closed 7 months ago

chloesavignac commented 8 months ago

Issue Description:

The pydeseq2.dds.DeseqDataSet class in the 0.4.4 release of the PyDESeq2 package does not recognize the keywords 'n_cpus' and 'n_processes', despite both of them being documented in the latest release's documentation (https://pydeseq2.readthedocs.io/en/latest/api/docstrings/pydeseq2.dds.DeseqDataSet.html).

Observations:

It appears that 'n_cpus' has replaced 'n_processes' in previous releases, but neither keyword functions as intended in version 0.4.4. The default behavior seems to utilize all available CPUs, leading to memory allocation errors when other processes require CPU resources. Attempts to modify the number of CPUs used by altering pydeseq2.default_inference.DefaultInference were unsuccessful. Downgrading to a previous release (0.4.0) allowed setting 'n_cpus' to 1 in pydeseq2.dds.DeseqDataSet, which temporarily resolved the issue. Even when using a previous release (0.4.0) , parallel processing fails, as memory allocation errors persist when 'n_cpus' is greater than 1. This issue occurred despite having ample CPU resources (256 cores), with tests conducted using values such as 10, 20, and 50 for 'n_cpus'.

BorisMuzellec commented 8 months ago

Hi @chloesavignac, hopefully the n_cpus argument issue is fixed in version 0.4.5 (now available on pypi). Could you please update to 0.4.5 and tell me if you're still experiencing this issue?

As for the number of cpus used in v0.4.0, the n_cpus attribute actually controls the number of workers that are spawned at a time to fit dispersions or LFCs, but as those workers rely on numpy / scipy / sklearn functions, each of them might use several cpus. If you want a finer control of cpu usage, you might need to set environment variables such as OMP_NUM_THREADS or OPENBLAS_NUM_THREADS to limit the number of threads each worker is allowed to spawn.

EDIT: v0.4.5 is actually buggy, please wait for v0.4.6

BorisMuzellec commented 8 months ago

EDIT: v0.4.5 is actually buggy, please wait for v0.4.6

v0.4.6 is now available on pypi