owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
573 stars 60 forks source link

Question (Continuous covariates)/ error with refit_cooks "ValueError: setting an array element with a sequence." #258

Closed Marwansha closed 5 months ago

Marwansha commented 5 months ago

Hey, sorry incase my questions are naive 1)I am trying to use PyDESeq2 using continuous covariate (age) taking into account, sex,cmv serotype etc. in order to get differentially expressed genes that change with AGE ?

2)setting refit_cooks as True give an error (ValueError: setting an array element with a sequence.) , while changing it to false solve this.

should the deseqDataset be designed like this? and how the refit_cooks=False affect the results as as i see the output pvalues and lfc with no any signiificant results( coult refit_cooks be affecting this) - volcanoplot below

inference = DefaultInference(n_cpus=8)
dds = DeseqDataSet(
    adata=mono,
    design_factors=["sex","cmv","AGE"],  
    continuous_factors=['AGE'],
    refit_cooks=False,
    inference=inference,
)
dds.deseq2()
stat_res = DeseqStats(dds,inference=inference,)
stat_res.summary()

2)  inference = DefaultInference(n_cpus=8)
dds = DeseqDataSet(
    adata=mono,
    design_factors="AGE",  # Corrected here
    continuous_factors=['AGE'],
    refit_cooks=True,
    inference=inference,
)
dds.deseq2()
Fitting size factors...
... done in 0.10 seconds.

Replacing 9 outlier genes.

Fitting dispersions...
... done in 0.02 seconds.

Fitting MAP dispersions...
... done in 0.02 seconds.

Fitting LFCs...
... done in 0.05 seconds.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1429, in Block.setitem(self, indexer, value, using_cow)
   1428 try:
-> 1429     values[indexer] = casted
   1430 except (TypeError, ValueError) as err:

ValueError: shape mismatch: value array of shape (9,46) could not be broadcast to indexing result of shape (9,2)

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In[109], line 1
----> 1 dds.deseq2()

File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:453, in DeseqDataSet.deseq2(self)
    448 self.calculate_cooks()
    450 if self.refit_cooks:
    451     # Replace outlier counts, and refit dispersions and LFCs
    452     # for genes that had outliers replaced
--> 453     self.refit()

File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:871, in DeseqDataSet.refit(self)
    864     print(
    865         f"Replacing {sum(self.varm['replaced']) } outlier genes.\n",
    866         file=sys.stderr,
    867     )
    869 if sum(self.varm["replaced"]) > 0:
    870     # Refit dispersions and LFCs for genes that had outliers replaced
--> 871     self._refit_without_outliers()
    872 else:
    873     # Store the fact that no sample was refitted
    874     self.varm["refitted"] = np.full(
    875         self.n_vars,
    876         False,
    877     )

File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:1079, in DeseqDataSet._refit_without_outliers(self)
   1077 # Replace values in main object
   1078 self.varm["_normed_means"][self.varm["refitted"]] = sub_dds.varm["_normed_means"]
-> 1079 self.varm["LFC"][self.varm["refitted"]] = sub_dds.varm["LFC"]
   1080 self.varm["dispersions"][self.varm["refitted"]] = sub_dds.varm["dispersions"]
   1082 replace_cooks = pd.DataFrame(self.layers["cooks"].copy())

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/frame.py:4287, in DataFrame.__setitem__(self, key, value)
   4285     self._setitem_frame(key, value)
   4286 elif isinstance(key, (Series, np.ndarray, list, Index)):
-> 4287     self._setitem_array(key, value)
   4288 elif isinstance(value, DataFrame):
   4289     self._set_item_frame_value(key, value)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/frame.py:4322, in DataFrame._setitem_array(self, key, value)
   4319     if isinstance(value, DataFrame):
   4320         # GH#39931 reindex since iloc does not align
   4321         value = value.reindex(self.index.take(indexer))
-> 4322     self.iloc[indexer] = value
   4324 else:
   4325     # Note: unlike self.iloc[:, indexer] = value, this will
   4326     #  never try to overwrite values inplace
   4328     if isinstance(value, DataFrame):

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:911, in _LocationIndexer.__setitem__(self, key, value)
    908 self._has_valid_setitem_indexer(key)
    910 iloc = self if self.name == "iloc" else self.obj.iloc
--> 911 iloc._setitem_with_indexer(indexer, value, self.name)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:1944, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
   1942     self._setitem_with_indexer_split_path(indexer, value, name)
   1943 else:
-> 1944     self._setitem_single_block(indexer, value, name)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:2218, in _iLocIndexer._setitem_single_block(self, indexer, value, name)
   2215 self.obj._check_is_chained_assignment_possible()
   2217 # actually do the set
-> 2218 self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
   2219 self.obj._maybe_update_cacher(clear=True, inplace=True)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/managers.py:415, in BaseBlockManager.setitem(self, indexer, value, warn)
    411     # No need to split if we either set all columns or on a single block
    412     # manager
    413     self = self.copy()
--> 415 return self.apply("setitem", indexer=indexer, value=value)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    361         applied = b.apply(f, **kwargs)
    362     else:
--> 363         applied = getattr(b, f)(**kwargs)
    364     result_blocks = extend_blocks(applied, result_blocks)
    366 out = type(self).from_blocks(result_blocks, self.axes)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1432, in Block.setitem(self, indexer, value, using_cow)
   1430     except (TypeError, ValueError) as err:
   1431         if is_list_like(casted):
-> 1432             raise ValueError(
   1433                 "setting an array element with a sequence."
   1434             ) from err
   1435         raise
   1436 return self

ValueError: setting an array element with a sequence.

image

BorisMuzellec commented 5 months ago

Hi @Marwansha, it looks like there is a typo in your design factors argument: I think it should be design_factors=["sex", "cmv", "AGE"] (commas) instead of design_factors=["sex"+"cmv"+"AGE"].

Apart from this, your code looks good. If you have the version 0.4.7 you can even simplify it to

dds = DeseqDataSet(
    adata=mono,
    design_factors=["sex", "cmv", "AGE"],  
    continuous_factors=['AGE'],
    refit_cooks=False,
    n_cpus=8,
)

dds.deseq2()
stat_res = DeseqStats(dds)
stat_res.summary()
Marwansha commented 5 months ago

Thanks alot for your fast response.

indeed i mistakenly did this typo now while describing it here but the code was alright

i still get same error of "setting an array element with a sequence" ,this error is not there when i refit_cooks=False and i want to know to which extent refit_cooks can affect the results. this error is only when am using continuous factor in the design ,refit_cooks = True work normall ywith categorical. i adjusted the main text from typo errors Thanks again


dds = DeseqDataSet(
    adata=mono,
    design_factors="AGE",  # Corrected here
    continuous_factors=['AGE'],
    refit_cooks=True,
    inference=inference,
)
dds.deseq2()

Fitting dispersions...
... done in 0.03 seconds.

Fitting MAP dispersions...
... done in 0.02 seconds.

Fitting LFCs...
... done in 0.04 seconds.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1429, in Block.setitem(self, indexer, value, using_cow)
   1428 try:
-> 1429     values[indexer] = casted
   1430 except (TypeError, ValueError) as err:

ValueError: shape mismatch: value array of shape (9,46) could not be broadcast to indexing result of shape (9,2)

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In[111], line 9
      1 inference = DefaultInference(n_cpus=8)
      2 dds = DeseqDataSet(
      3     adata=mono,
      4     design_factors="AGE",  # Corrected here
   (...)
      7     inference=inference,
      8 )
----> 9 dds.deseq2()

File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:453, in DeseqDataSet.deseq2(self)
    448 self.calculate_cooks()
    450 if self.refit_cooks:
    451     # Replace outlier counts, and refit dispersions and LFCs
    452     # for genes that had outliers replaced
--> 453     self.refit()

File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:871, in DeseqDataSet.refit(self)
    864     print(
    865         f"Replacing {sum(self.varm['replaced']) } outlier genes.\n",
    866         file=sys.stderr,
    867     )
    869 if sum(self.varm["replaced"]) > 0:
    870     # Refit dispersions and LFCs for genes that had outliers replaced
--> 871     self._refit_without_outliers()
    872 else:
    873     # Store the fact that no sample was refitted
    874     self.varm["refitted"] = np.full(
    875         self.n_vars,
    876         False,
    877     )

File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:1079, in DeseqDataSet._refit_without_outliers(self)
   1077 # Replace values in main object
   1078 self.varm["_normed_means"][self.varm["refitted"]] = sub_dds.varm["_normed_means"]
-> 1079 self.varm["LFC"][self.varm["refitted"]] = sub_dds.varm["LFC"]
   1080 self.varm["dispersions"][self.varm["refitted"]] = sub_dds.varm["dispersions"]
   1082 replace_cooks = pd.DataFrame(self.layers["cooks"].copy())

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/frame.py:4287, in DataFrame.__setitem__(self, key, value)
   4285     self._setitem_frame(key, value)
   4286 elif isinstance(key, (Series, np.ndarray, list, Index)):
-> 4287     self._setitem_array(key, value)
   4288 elif isinstance(value, DataFrame):
   4289     self._set_item_frame_value(key, value)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/frame.py:4322, in DataFrame._setitem_array(self, key, value)
   4319     if isinstance(value, DataFrame):
   4320         # GH#39931 reindex since iloc does not align
   4321         value = value.reindex(self.index.take(indexer))
-> 4322     self.iloc[indexer] = value
   4324 else:
   4325     # Note: unlike self.iloc[:, indexer] = value, this will
   4326     #  never try to overwrite values inplace
   4328     if isinstance(value, DataFrame):

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:911, in _LocationIndexer.__setitem__(self, key, value)
    908 self._has_valid_setitem_indexer(key)
    910 iloc = self if self.name == "iloc" else self.obj.iloc
--> 911 iloc._setitem_with_indexer(indexer, value, self.name)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:1944, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
   1942     self._setitem_with_indexer_split_path(indexer, value, name)
   1943 else:
-> 1944     self._setitem_single_block(indexer, value, name)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:2218, in _iLocIndexer._setitem_single_block(self, indexer, value, name)
   2215 self.obj._check_is_chained_assignment_possible()
   2217 # actually do the set
-> 2218 self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
   2219 self.obj._maybe_update_cacher(clear=True, inplace=True)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/managers.py:415, in BaseBlockManager.setitem(self, indexer, value, warn)
    411     # No need to split if we either set all columns or on a single block
    412     # manager
    413     self = self.copy()
--> 415 return self.apply("setitem", indexer=indexer, value=value)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    361         applied = b.apply(f, **kwargs)
    362     else:
--> 363         applied = getattr(b, f)(**kwargs)
    364     result_blocks = extend_blocks(applied, result_blocks)
    366 out = type(self).from_blocks(result_blocks, self.axes)

File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1432, in Block.setitem(self, indexer, value, using_cow)
   1430     except (TypeError, ValueError) as err:
   1431         if is_list_like(casted):
-> 1432             raise ValueError(
   1433                 "setting an array element with a sequence."
   1434             ) from err
   1435         raise
   1436 return self

ValueError: setting an array element with a sequence.
Marwansha commented 5 months ago

Sorry for bothering again. just wanted to ask if its also possible to add a interaction term of 2 factors in the design_factors , ?

design_factors=["sex", "cmv", "AGE","AGE:sex"],

BorisMuzellec commented 5 months ago

Sorry for bothering again. just wanted to ask if its also possible to add a interaction term of 2 factors in the design_factors , ?

design_factors=["sex", "cmv", "AGE","AGE:sex"],

It's not supported yet, but @jeandut is working on it (#181)

Marwansha commented 5 months ago

Thanks for the help and the good work @BorisMuzellec will follow up and wait for the release

let me know if you need further info regarding the "ValueError: setting an array element with a sequence." when refit_cooks is set as True as continuous variable is used in the design factor

Have an amazing day