saezlab / dorothea-py

Dorothea package in Python
MIT License
11 stars 3 forks source link

Can't use all ['A','B','C', 'D', 'E'] levels of confidence #4

Closed oligomyeggo closed 3 years ago

oligomyeggo commented 3 years ago

Hello! First, thank you for developing this version of DoRothEA for the scanpy ecosystem! It's great, and I have enjoyed using it so far. I am running into an issue where I can't call dorothea.load_regulons on the full array of confidence levels. I have no issues when using ['A'] through ['A', 'B', 'C', 'D'], but when I try to run the following:

regulons = dorothea.load_regulons(
    ['A', 'B', 'C', 'D', 'E'], 
    organism = 'Mouse' 
)

I get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-65dd9e0ed1f5> in <module>
----> 1 regulons = dorothea.load_regulons(
      2     ['A','B','C', 'D', 'E'],   # Which levels of confidence to use (A most confident, E least confident)
      3     organism='Mouse' # If working with mouse, set to Mouse
      4 )

~/.local/share/virtualenvs/tf_test-o-8qhwxg/lib/python3.9/site-packages/dorothea/dorothea.py in load_regulons(levels, organism, commercial)
     57 
     58     # Transform to binary dataframe
---> 59     dorothea_df = df.pivot(index='target', columns='tf', values='mor')
     60 
     61     # Set nans to 0

~/.local/share/virtualenvs/tf_test-o-8qhwxg/lib/python3.9/site-packages/pandas/core/frame.py in pivot(self, index, columns, values)
   6877         from pandas.core.reshape.pivot import pivot
   6878 
-> 6879         return pivot(self, index=index, columns=columns, values=values)
   6880 
   6881     _shared_docs[

~/.local/share/virtualenvs/tf_test-o-8qhwxg/lib/python3.9/site-packages/pandas/core/reshape/pivot.py in pivot(data, index, columns, values)
    459         else:
    460             indexed = data._constructor_sliced(data[values]._values, index=index)
--> 461     return indexed.unstack(columns)
    462 
    463 

~/.local/share/virtualenvs/tf_test-o-8qhwxg/lib/python3.9/site-packages/pandas/core/series.py in unstack(self, level, fill_value)
   3827         from pandas.core.reshape.reshape import unstack
   3828 
-> 3829         return unstack(self, level, fill_value)
   3830 
   3831     # ----------------------------------------------------------------------

~/.local/share/virtualenvs/tf_test-o-8qhwxg/lib/python3.9/site-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value)
    428         if is_extension_array_dtype(obj.dtype):
    429             return _unstack_extension_series(obj, level, fill_value)
--> 430         unstacker = _Unstacker(
    431             obj.index, level=level, constructor=obj._constructor_expanddim
    432         )

~/.local/share/virtualenvs/tf_test-o-8qhwxg/lib/python3.9/site-packages/pandas/core/reshape/reshape.py in __init__(self, index, level, constructor)
    116             raise ValueError("Unstacked DataFrame is too big, causing int32 overflow")
    117 
--> 118         self._make_selectors()
    119 
    120     @cache_readonly

~/.local/share/virtualenvs/tf_test-o-8qhwxg/lib/python3.9/site-packages/pandas/core/reshape/reshape.py in _make_selectors(self)
    165 
    166         if mask.sum() < len(self.index):
--> 167             raise ValueError("Index contains duplicate entries, cannot reshape")
    168 
    169         self.group_index = comp_index

ValueError: Index contains duplicate entries, cannot reshape

Any insight as to what might be going on?

Thank you!

Versions (on Mac OS):

anndata==0.7.6
appnope==0.1.2
argon2-cffi==20.1.0
async-generator==1.10
attrs==21.2.0
backcall==0.2.0
bleach==3.3.0
cffi==1.14.5
cycler==0.10.0
decorator==4.4.2
defusedxml==0.7.1
dorothea-py==1.0.3
entrypoints==0.3
get-version==2.2
h5py==3.2.1
ipykernel==5.5.5
ipython==7.23.1
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==3.0.1
joblib==1.0.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
legacy-api-wrap==1.2
llvmlite==0.36.0
MarkupSafe==2.0.1
matplotlib==3.4.2
matplotlib-inline==0.1.2
mistune==0.8.4
natsort==7.1.1
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.4.0
numba==0.53.1
numexpr==2.7.3
numpy==1.20.3
packaging==20.9
pandas==1.2.4
pandocfilters==1.4.3
parso==0.8.2
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.2.0
progeny-py==1.0.3
prometheus-client==0.10.1
prompt-toolkit==3.0.18
ptyprocess==0.7.0
pycparser==2.20
Pygments==2.9.0
pynndescent==0.5.2
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
pyzmq==22.0.3
qtconsole==5.1.0
QtPy==1.9.0
scanpy==1.7.2
scikit-learn==0.24.2
scipy==1.6.3
seaborn==0.11.1
Send2Trash==1.5.0
sinfo==0.3.4
six==1.16.0
statsmodels==0.12.2
stdlib-list==0.8.0
tables==3.6.1
terminado==0.10.0
testpath==0.5.0
threadpoolctl==2.1.0
tornado==6.1
tqdm==4.60.0
traitlets==5.0.5
umap-learn==0.5.1
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
xlrd==1.2.0
PauBadiaM commented 3 years ago

Hi @oligomyeggo, thanks for trying it out! Yes indeed, we are aware about this issue, we had some entries that were repeated in the level E. Tomorrow I plan to update the repository with the corrected networks, in the meantime you can use all confidence levels excluding E:

regulons = dorothea.load_regulons(
    ['A', 'B', 'C', 'D'], 
    organism = 'Mouse' 
)

This should work

oligomyeggo commented 3 years ago

Great, thank you! And thanks again for the fantastic tool - very excited to use it in my current and upcoming projects!

PauBadiaM commented 3 years ago

Hi @oligomyeggo! I updated the regulons and now it show work:

pip install --upgrade dorothea-py

Let me know if you run into any other problem.