Closed Marwansha closed 5 months ago
Hi @Marwansha, it looks like there is a typo in your design factors argument: I think it should be
design_factors=["sex", "cmv", "AGE"]
(commas) instead of design_factors=["sex"+"cmv"+"AGE"]
.
Apart from this, your code looks good. If you have the version 0.4.7 you can even simplify it to
dds = DeseqDataSet(
adata=mono,
design_factors=["sex", "cmv", "AGE"],
continuous_factors=['AGE'],
refit_cooks=False,
n_cpus=8,
)
dds.deseq2()
stat_res = DeseqStats(dds)
stat_res.summary()
Thanks alot for your fast response.
indeed i mistakenly did this typo now while describing it here but the code was alright
i still get same error of "setting an array element with a sequence" ,this error is not there when i refit_cooks=False and i want to know to which extent refit_cooks can affect the results. this error is only when am using continuous factor in the design ,refit_cooks = True work normall ywith categorical. i adjusted the main text from typo errors Thanks again
dds = DeseqDataSet(
adata=mono,
design_factors="AGE", # Corrected here
continuous_factors=['AGE'],
refit_cooks=True,
inference=inference,
)
dds.deseq2()
Fitting dispersions...
... done in 0.03 seconds.
Fitting MAP dispersions...
... done in 0.02 seconds.
Fitting LFCs...
... done in 0.04 seconds.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1429, in Block.setitem(self, indexer, value, using_cow)
1428 try:
-> 1429 values[indexer] = casted
1430 except (TypeError, ValueError) as err:
ValueError: shape mismatch: value array of shape (9,46) could not be broadcast to indexing result of shape (9,2)
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
Cell In[111], line 9
1 inference = DefaultInference(n_cpus=8)
2 dds = DeseqDataSet(
3 adata=mono,
4 design_factors="AGE", # Corrected here
(...)
7 inference=inference,
8 )
----> 9 dds.deseq2()
File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:453, in DeseqDataSet.deseq2(self)
448 self.calculate_cooks()
450 if self.refit_cooks:
451 # Replace outlier counts, and refit dispersions and LFCs
452 # for genes that had outliers replaced
--> 453 self.refit()
File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:871, in DeseqDataSet.refit(self)
864 print(
865 f"Replacing {sum(self.varm['replaced']) } outlier genes.\n",
866 file=sys.stderr,
867 )
869 if sum(self.varm["replaced"]) > 0:
870 # Refit dispersions and LFCs for genes that had outliers replaced
--> 871 self._refit_without_outliers()
872 else:
873 # Store the fact that no sample was refitted
874 self.varm["refitted"] = np.full(
875 self.n_vars,
876 False,
877 )
File ~/venvs/CODA/lib/python3.9/site-packages/pydeseq2/dds.py:1079, in DeseqDataSet._refit_without_outliers(self)
1077 # Replace values in main object
1078 self.varm["_normed_means"][self.varm["refitted"]] = sub_dds.varm["_normed_means"]
-> 1079 self.varm["LFC"][self.varm["refitted"]] = sub_dds.varm["LFC"]
1080 self.varm["dispersions"][self.varm["refitted"]] = sub_dds.varm["dispersions"]
1082 replace_cooks = pd.DataFrame(self.layers["cooks"].copy())
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/frame.py:4287, in DataFrame.__setitem__(self, key, value)
4285 self._setitem_frame(key, value)
4286 elif isinstance(key, (Series, np.ndarray, list, Index)):
-> 4287 self._setitem_array(key, value)
4288 elif isinstance(value, DataFrame):
4289 self._set_item_frame_value(key, value)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/frame.py:4322, in DataFrame._setitem_array(self, key, value)
4319 if isinstance(value, DataFrame):
4320 # GH#39931 reindex since iloc does not align
4321 value = value.reindex(self.index.take(indexer))
-> 4322 self.iloc[indexer] = value
4324 else:
4325 # Note: unlike self.iloc[:, indexer] = value, this will
4326 # never try to overwrite values inplace
4328 if isinstance(value, DataFrame):
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:911, in _LocationIndexer.__setitem__(self, key, value)
908 self._has_valid_setitem_indexer(key)
910 iloc = self if self.name == "iloc" else self.obj.iloc
--> 911 iloc._setitem_with_indexer(indexer, value, self.name)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:1944, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
1942 self._setitem_with_indexer_split_path(indexer, value, name)
1943 else:
-> 1944 self._setitem_single_block(indexer, value, name)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/indexing.py:2218, in _iLocIndexer._setitem_single_block(self, indexer, value, name)
2215 self.obj._check_is_chained_assignment_possible()
2217 # actually do the set
-> 2218 self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
2219 self.obj._maybe_update_cacher(clear=True, inplace=True)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/managers.py:415, in BaseBlockManager.setitem(self, indexer, value, warn)
411 # No need to split if we either set all columns or on a single block
412 # manager
413 self = self.copy()
--> 415 return self.apply("setitem", indexer=indexer, value=value)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
361 applied = b.apply(f, **kwargs)
362 else:
--> 363 applied = getattr(b, f)(**kwargs)
364 result_blocks = extend_blocks(applied, result_blocks)
366 out = type(self).from_blocks(result_blocks, self.axes)
File ~/venvs/CODA/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1432, in Block.setitem(self, indexer, value, using_cow)
1430 except (TypeError, ValueError) as err:
1431 if is_list_like(casted):
-> 1432 raise ValueError(
1433 "setting an array element with a sequence."
1434 ) from err
1435 raise
1436 return self
ValueError: setting an array element with a sequence.
Sorry for bothering again. just wanted to ask if its also possible to add a interaction term of 2 factors in the design_factors , ?
design_factors=["sex", "cmv", "AGE","AGE:sex"],
Sorry for bothering again. just wanted to ask if its also possible to add a interaction term of 2 factors in the design_factors , ?
design_factors=["sex", "cmv", "AGE","AGE:sex"],
It's not supported yet, but @jeandut is working on it (#181)
Thanks for the help and the good work @BorisMuzellec will follow up and wait for the release
let me know if you need further info regarding the "ValueError: setting an array element with a sequence." when refit_cooks is set as True as continuous variable is used in the design factor
Have an amazing day
Hey, sorry incase my questions are naive 1)I am trying to use PyDESeq2 using continuous covariate (age) taking into account, sex,cmv serotype etc. in order to get differentially expressed genes that change with AGE ?
2)setting refit_cooks as True give an error (ValueError: setting an array element with a sequence.) , while changing it to false solve this.
should the deseqDataset be designed like this? and how the refit_cooks=False affect the results as as i see the output pvalues and lfc with no any signiificant results( coult refit_cooks be affecting this) - volcanoplot below