saezlab / dorothea-py

Dorothea package in Python
MIT License
11 stars 3 forks source link

AttributeError when trying to load dorothea network #5

Closed PattF closed 3 years ago

PattF commented 3 years ago

Hi, I'm having issues when trying to run the following code following your notebook: regulons = dorothea.load_regulons( ['A','B','C'], organism='Human')

When trying to run that segment, I run into the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-8-941bb5f65e7d> in <module>
      1 regulon = dorothea.load_regulons(
      2     ['A', 'B', 'C'],   # Which levels of confidence to use (A most confident, E least confident)
----> 3     organism='Human' # If working with mouse, set to Mouse
      4 )

~\anaconda3\lib\site-packages\dorothea\dorothea.py in load_regulons(levels, organism, commercial)
     58 
     59     #Filter by levels of confidence
---> 60     df = df[df['confidence'].isin(levels)]
     61 
     62     # Transform to binary dataframe

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2771         if is_hashable(key):
   2772             # shortcut if the key is in columns
-> 2773             if self.columns.is_unique and key in self.columns:
   2774                 if self.columns.nlevels > 1:
   2775                     return self._getitem_multilevel(key)

~\anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5268             or name in self._accessors
   5269         ):
-> 5270             return object.__getattribute__(self, name)
   5271         else:
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):

pandas\_libs\properties.pyx in pandas._libs.properties.AxisProperty.__get__()

~\anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5268             or name in self._accessors
   5269         ):
-> 5270             return object.__getattribute__(self, name)
   5271         else:
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):

AttributeError: 'DataFrame' object has no attribute '_data'

I'm running the following modules: scanpy==1.7.2 anndata==0.7.4 numpy==1.18.1 scipy==1.5.2 pandas==1.0.1 scikit-learn==0.23.2 dorothea==1.0.5

Appreciate any help!

PauBadiaM commented 3 years ago

Hi @PattF! Thanks for checking out the package. This is because the original data was saved using pandas >=1.1.0. I will change the package requirements. To update if you are using conda:

conda install pandas==1.1.0

or if you are using pip:

pip install pandas==1.1.0

When you update it it should work:

import dorothea
import pandas as pd

print(pd.__version__)
print(dorothea.load_regulons( ['A','B','C'], organism='Human'))
1.1.0
tf       AHR   AR  ARID2  ARID3A  ARNT  ARNTL  ASCL1  ATF1  ATF2  ATF3  ATF4  ATF6  ATF7  ...  ZEB1  ZEB2  ZFX  ZKSCAN1  ZNF143  ZNF217  ZNF24  ZNF263  ZNF274  ZNF384  ZNF592  ZNF639  ZNF740
target                                                                                    ...                                                                                                 
A2M      0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
AAK1     0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
AARS1    0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
AATK     0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0    -1.0     0.0     0.0     0.0     0.0     0.0
ABAT     0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
...      ...  ...    ...     ...   ...    ...    ...   ...   ...   ...   ...   ...   ...  ...   ...   ...  ...      ...     ...     ...    ...     ...     ...     ...     ...     ...     ...
ZSCAN31  0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
ZSCAN9   0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
ZXDC     0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
ZZEF1    0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0
ZZZ3     0.0  0.0    0.0     0.0   0.0    0.0    0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0  0.0      0.0     0.0     0.0    0.0     0.0     0.0     0.0     0.0    -1.0     0.0

[5321 rows x 271 columns]
PattF commented 3 years ago

Hi @PauBadiaM, thanks for the quick reply. The update to pandas==1.1.0 worked, and I managed to run the initial line. I've run into another error though when running the next segment on TF activity estimation. I ran: dorothea.run(adata, regulon, center=True, num_perm=100, norm=True, scale=True, use_raw=True, min_size=5, ) And ran into the following error:

5171 targets found
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [1:02:49<00:00, 37.70s/it]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1631 
-> 1632         mgr = BlockManager(blocks, axes)
   1633         mgr._consolidate_inplace()

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in __init__(self, blocks, axes, do_integrity_check)
    138         if do_integrity_check:
--> 139             self._verify_integrity()
    140 

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in _verify_integrity(self)
    315             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 316                 raise construction_error(tot_items, block.shape[1:], self.axes)
    317         if len(self.items) != tot_items:

ValueError: Shape of passed values is (255901, 271), indices imply (18524, 271)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-35-d0c9c67fda00> in <module>
      6              scale=True,   # Scale values per feature so that values can be compared across cells
      7              use_raw=True, # Use raw adata, where we have the lognorm gene expression
----> 8              min_size=5,   # TF with less than 5 targets will be ignored
      9             )

~\anaconda3\lib\site-packages\dorothea\dorothea.py in run(data, regnet, center, num_perm, norm, scale, scale_axis, inplace, use_raw, use_hvg, obsm_key, min_size)
    272 
    273     # Store in df
--> 274     result = pd.DataFrame(tf_act, columns=r_tfs, index=x_samples)
    275 
    276     if isinstance(data, AnnData) and inplace:

~\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    494                 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
    495             else:
--> 496                 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    497 
    498         # For data is list-like, or Iterable (will consume into list)

~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
    232         block_values = [values]
    233 
--> 234     return create_block_manager_from_blocks(block_values, [columns, index])
    235 
    236 

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1637         blocks = [getattr(b, "values", b) for b in blocks]
   1638         tot_items = sum(b.shape[0] for b in blocks)
-> 1639         raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1640 
   1641 

ValueError: Shape of passed values is (255901, 271), indices imply (18524, 271)

Not sure I understand the issue here, is there an issue with matrix size between raw and processed adata files? Appreciate any help, thanks!

PauBadiaM commented 3 years ago

Hi @PattF! I ran the pbmc tutorial in examples/ with the same python environment and it worked. Then it must be something about your data. Could it be that the observation names (adata.obs.index and adata.raw.obs_names) are not unique? What is the shape of your adata and adata.raw object?

PattF commented 3 years ago

Thanks for the suggestions! It's a dataset from a publication, I think issue was because the raw data was showing the size of the whole dataset (255901, 271) while I was trying to only run a subsetted group (size=(18524, 271)). Manage to get it to work I think. Although at times when I try to run a specific TF of interest, it comes back with: KeyError: "Could not find keys '['SIX1']' in columns of adata.obs or in adata.var_names." Is this due to the filtering?

Another quick question and somewhat unrelated, would you know how to rename .obs categories? I'm trying to plot the heatmap that shows the top activated TF per cell type, but this dataset has the .obs labelled as "cell types" and it throws an invalid syntax error. Would you know how to rename adata.obs['cell types'] to adata.obs['cell_types'] ? Thanks!

PauBadiaM commented 3 years ago

Hi @PattF! Since the activities are stored inside the .obsm of an AnnData object you cannot access them directly by doing adata['SIX1'] Instead you should access them using the function extract:

dorothea.extract(adata)['SIX1']

Or directly from .osbm:

adata.obsm['dorothea']['SIX1']

It could also be that you are only loading levels ABC, SIX1 belongs to the confidence level E so you should load it:

regulons = dorothea.extract(['A','B','C', 'D', 'E'])
regulons['SIX1']

For the second question, you don't need to rename your object, simply change the name like this:

tfs = dict()
for cell_type in adata.obs['cell type'].cat.categories:
    df = dorothea.rank_tfs_groups(adata, groupby='cell type', group=cell_type)
    tf = df.head(1).index.values
    tfs[cell_type] = tf

sc.pl.matrixplot(dorothea.extract(adata), tfs, 'cell type', dendrogram=True, cmap='coolwarm', vmin=-2, vmax=2)
PattF commented 3 years ago

Hi @PauBadiaM, Really sorry, realized I never wrote back to your response. All the suggestions you provided worked great, thanks for the help! / Patrick