Closed PattF closed 3 years ago
Hi @PattF! Thanks for checking out the package.
This is because the original data was saved using pandas
>=1.1.0. I will change the package requirements.
To update if you are using conda:
conda install pandas==1.1.0
or if you are using pip:
pip install pandas==1.1.0
When you update it it should work:
import dorothea
import pandas as pd
print(pd.__version__)
print(dorothea.load_regulons( ['A','B','C'], organism='Human'))
1.1.0
tf AHR AR ARID2 ARID3A ARNT ARNTL ASCL1 ATF1 ATF2 ATF3 ATF4 ATF6 ATF7 ... ZEB1 ZEB2 ZFX ZKSCAN1 ZNF143 ZNF217 ZNF24 ZNF263 ZNF274 ZNF384 ZNF592 ZNF639 ZNF740
target ...
A2M 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AAK1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AARS1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AATK 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0
ABAT 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
ZSCAN31 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ZSCAN9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ZXDC 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ZZEF1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ZZZ3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0
[5321 rows x 271 columns]
Hi @PauBadiaM, thanks for the quick reply.
The update to pandas==1.1.0 worked, and I managed to run the initial line.
I've run into another error though when running the next segment on TF activity estimation.
I ran:
dorothea.run(adata, regulon, center=True, num_perm=100, norm=True, scale=True, use_raw=True, min_size=5, )
And ran into the following error:
5171 targets found
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [1:02:49<00:00, 37.70s/it]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1631
-> 1632 mgr = BlockManager(blocks, axes)
1633 mgr._consolidate_inplace()
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in __init__(self, blocks, axes, do_integrity_check)
138 if do_integrity_check:
--> 139 self._verify_integrity()
140
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in _verify_integrity(self)
315 if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 316 raise construction_error(tot_items, block.shape[1:], self.axes)
317 if len(self.items) != tot_items:
ValueError: Shape of passed values is (255901, 271), indices imply (18524, 271)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-35-d0c9c67fda00> in <module>
6 scale=True, # Scale values per feature so that values can be compared across cells
7 use_raw=True, # Use raw adata, where we have the lognorm gene expression
----> 8 min_size=5, # TF with less than 5 targets will be ignored
9 )
~\anaconda3\lib\site-packages\dorothea\dorothea.py in run(data, regnet, center, num_perm, norm, scale, scale_axis, inplace, use_raw, use_hvg, obsm_key, min_size)
272
273 # Store in df
--> 274 result = pd.DataFrame(tf_act, columns=r_tfs, index=x_samples)
275
276 if isinstance(data, AnnData) and inplace:
~\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
494 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
495 else:
--> 496 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
497
498 # For data is list-like, or Iterable (will consume into list)
~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
232 block_values = [values]
233
--> 234 return create_block_manager_from_blocks(block_values, [columns, index])
235
236
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1637 blocks = [getattr(b, "values", b) for b in blocks]
1638 tot_items = sum(b.shape[0] for b in blocks)
-> 1639 raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
1640
1641
ValueError: Shape of passed values is (255901, 271), indices imply (18524, 271)
Not sure I understand the issue here, is there an issue with matrix size between raw and processed adata files? Appreciate any help, thanks!
Hi @PattF! I ran the pbmc tutorial in examples/
with the same python environment and it worked. Then it must be something about your data.
Could it be that the observation names (adata.obs.index
and adata.raw.obs_names
) are not unique? What is the shape of your adata
and adata.raw
object?
Thanks for the suggestions! It's a dataset from a publication, I think issue was because the raw data was showing the size of the whole dataset (255901, 271) while I was trying to only run a subsetted group (size=(18524, 271)). Manage to get it to work I think.
Although at times when I try to run a specific TF of interest, it comes back with: KeyError: "Could not find keys '['SIX1']' in columns of adata.obs or in adata.var_names."
Is this due to the filtering?
Another quick question and somewhat unrelated, would you know how to rename .obs categories? I'm trying to plot the heatmap that shows the top activated TF per cell type, but this dataset has the .obs labelled as "cell types" and it throws an invalid syntax error.
Would you know how to rename adata.obs['cell types']
to adata.obs['cell_types']
?
Thanks!
Hi @PattF!
Since the activities are stored inside the .obsm
of an AnnData object you cannot access them directly by doing adata['SIX1']
Instead you should access them using the function extract
:
dorothea.extract(adata)['SIX1']
Or directly from .osbm
:
adata.obsm['dorothea']['SIX1']
It could also be that you are only loading levels ABC, SIX1 belongs to the confidence level E so you should load it:
regulons = dorothea.extract(['A','B','C', 'D', 'E'])
regulons['SIX1']
For the second question, you don't need to rename your object, simply change the name like this:
tfs = dict()
for cell_type in adata.obs['cell type'].cat.categories:
df = dorothea.rank_tfs_groups(adata, groupby='cell type', group=cell_type)
tf = df.head(1).index.values
tfs[cell_type] = tf
sc.pl.matrixplot(dorothea.extract(adata), tfs, 'cell type', dendrogram=True, cmap='coolwarm', vmin=-2, vmax=2)
Hi @PauBadiaM, Really sorry, realized I never wrote back to your response. All the suggestions you provided worked great, thanks for the help! / Patrick
Hi, I'm having issues when trying to run the following code following your notebook:
regulons = dorothea.load_regulons( ['A','B','C'], organism='Human')
When trying to run that segment, I run into the following error:
I'm running the following modules: scanpy==1.7.2 anndata==0.7.4 numpy==1.18.1 scipy==1.5.2 pandas==1.0.1 scikit-learn==0.23.2 dorothea==1.0.5
Appreciate any help!