theislab / diffxpy

Differential expression analysis for single-cell RNA-seq data.
https://diffxpy.rtfd.io
BSD 3-Clause "New" or "Revised" License
193 stars 23 forks source link

BUG: `struct.error` when `de.test.wald(...)` #155

Open HypoChloremic opened 4 years ago

HypoChloremic commented 4 years ago

I was using diffxpy to find marker-genes for a cluster vs all other clusters in a scRNA-dataset stored in adata. When performing the following:

# Performing differential expression to find the markers for
# cluter 20, which was defined as a CD33+ myeloid by HGA (Human Gene Atlas). 
adata.obs['twty_All'] = [
    'group 1' if int(i) == 20 else 'group 2' for i in adata.obs['leiden']
    ]

cl20_test = de.test.wald(
    data=adata,
    formula_loc="~ 1 + twty_All",
    factor_loc_totest="twty_All"
)

The following error is produced, which I do not know how to resolve:

training location model: False
training scale model: True
iter   0: ll=23072758620.633224
iter   1: ll=23072758620.633224, converged: 0.00% (loc: 100.00%, scale update: False), in 0.00sec
Fitting 26542 dispersion models: (progress not available with multiprocessing)

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/site-packages/diffxpy/testing/tests.py", line 736, in wald
    **kwargs,
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/site-packages/diffxpy/testing/tests.py", line 244, in _fit
    **train_args
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/site-packages/batchglm/models/base/estimator.py", line 124, in train_sequence
    self.train(**d, **kwargs)
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/site-packages/batchglm/train/numpy/base_glm/estimator.py", line 112, in train
    nproc=nproc
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/site-packages/batchglm/train/numpy/base_glm/estimator.py", line 351, in b_step
    nproc=nproc
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/site-packages/batchglm/train/numpy/base_glm/estimator.py", line 478, in _b_step_loop
    ) for j in idx_update]
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "[USERNAME]/anaconda3/envs/scMachineLearning1/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

I have noted that there are similar issues involving multiprocessing but I haven't had the time to check them out in detail.

(de.__version__ = 'v0.7.4' , sc.__version__ = '1.4.6)

JZL commented 4 years ago

I had the same issue and I think I narrowed it down to a python versioning issue. I was running an older version of Python (3.6) and in newer versions of python this issue was fixed, specifically by this commit.

davidsebfischer commented 4 years ago

Yes, this should be addressed in python 3.8 I heared, but we are also working on internally mitigating this, I will keep you posted!

QianjiangHu commented 3 years ago

Hi, I have the same issue. Are there any solutions already?