kmayerb / tcrdist3

flexible CDR based distance metrics
MIT License
53 stars 17 forks source link

Cannot import hcluster_diff, member_summ, or plot_hclust_props #69

Closed psimps21 closed 2 years ago

psimps21 commented 2 years ago

I tried reproducing the sample code for hierarchical clustering using the dash dataset. I am able to import tcrdist3 and create the TCRRep fine but when I try to import hcluster_diff, member_summ, or plot_hclust_props using the code in the example I get the following error:


ValueError Traceback (most recent call last) Input In [17], in <cell line: 1>() ----> 1 from tcrdist.rep_diff import hcluster_diff, member_summ

File ~/CMU/LuLab/tcrdistenv/lib/python3.8/site-packages/tcrdist/rep_diff.py:4, in 1 import pandas as pd 2 import numpy as np ----> 4 import hierdiff as hd 6 all = ['neighborhood_diff', 7 'hcluster_diff', 8 'member_summ'] 10 def neighborhood_diff(clone_df, pwmat, x_cols, count_col='count', knn_neighbors=None, knn_radius=None, subset_ind=None, cluster_ind=None, test_method='fishers'):

File ~/CMU/LuLab/tcrdistenv/lib/python3.8/site-packages/hierdiff/init.py:3, in 1 from .hier_plot import plot_hclust, plot_hclust_props 2 from .tally import hcluster_tally, neighborhood_tally, running_neighborhood_tally ----> 3 from .association_testing import cluster_association_test 5 all = ['hcluster_tally', 6 'neighborhood_tally', 7 'running_neighborhood_tally', 8 'cluster_association_test', 9 'plot_hclust', 10 'plot_hclust_props']

File ~/CMU/LuLab/tcrdistenv/lib/python3.8/site-packages/hierdiff/association_testing.py:15, in 12 from scipy.stats.contingency import expected_freq 13 from scipy import stats ---> 15 from fishersapi import fishers_vec, fishers_frame, adjustnonnan 17 from .tally import _dict_to_nby2 19 all = ['cluster_association_test']

File ~/CMU/LuLab/tcrdistenv/lib/python3.8/site-packages/fishersapi/init.py:3, in 1 from future import absolute_import, division, print_function 2 from .version import version ----> 3 from .fishersapi import * 4 from .fishersapi import _scipy_fishers_vec 5 from .catcorr import catcorr

File ~/CMU/LuLab/tcrdistenv/lib/python3.8/site-packages/fishersapi/fishersapi.py:66, in 64 try: 65 """Attempt to use the fisher library (cython) if available (>1000x speedup)""" ---> 66 import fisher 67 # print_function("Using Cython-powered Fisher's exact test") 69 @_add_docstring(fishers_vec_doc) 70 def fishers_vec(a, b, c, d, alternative='two-sided', min_n=0):

File ~/CMU/LuLab/tcrdistenv/lib/python3.8/site-packages/fisher/init.py:3, in 1 from future import absolute_import ----> 3 from .cfisher import * 5 from ._version import get_versions 6 version = get_versions()['version']

File src/cfisher.pyx:1, in init fisher.cfisher()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I confirmed that all the packages met the required versions from setup.py. Most stack overflow responses for this error say to upgrade the numpy package. I tried that and upgraded from 1.21 to 1.22.2 and this did not solve the problem. Here is the pip list output for the virtual environment.

Package Version


appnope 0.1.2 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 attrs 21.4.0 backcall 0.2.0 bleach 4.1.0 cffi 1.15.0 cycler 0.11.0 debugpy 1.5.1 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.4 entrypoints 0.4 executing 0.8.3 feather-format 0.4.1 fisher 0.1.10 fishersapi 0.3 fonttools 4.29.1 hierdiff 0.8 importlib-resources 5.4.0 ipykernel 6.9.1 ipython 8.1.1 ipython-genutils 0.2.0 ipywidgets 7.6.5 jedi 0.18.1 Jinja2 3.0.3 joblib 1.1.0 jsonschema 4.4.0 jupyter 1.0.0 jupyter-client 7.1.2 jupyter-console 6.4.0 jupyter-core 4.9.2 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.2 kiwisolver 1.3.2 llvmlite 0.38.0 MarkupSafe 2.1.0 matplotlib 3.5.1 matplotlib-inline 0.1.3 mistune 0.8.4 nbclient 0.5.11 nbconvert 6.4.2 nbformat 5.1.3 nest-asyncio 1.5.4 notebook 6.4.8 numba 0.55.1 numpy 1.22.2 olga 1.2.4 packaging 21.3 palmotif 0.4 pandas 1.4.1 pandocfilters 1.5.0 parasail 1.1.17 parmap 1.5.3 parso 0.8.3 patsy 0.5.2 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.0.1 pip 22.0.3 progress 1.6 prometheus-client 0.13.1 prompt-toolkit 3.0.28 ptyprocess 0.7.0 pure-eval 0.2.2 pwseqdist 0.6 pyarrow 7.0.0 pycparser 2.21 Pygments 2.11.2 pyparsing 3.0.7 pyrsistent 0.18.1 python-dateutil 2.8.2 pytz 2021.3 pyzmq 22.3.0 qtconsole 5.2.2 QtPy 2.0.1 scipy 1.8.0 Send2Trash 1.8.0 setuptools 60.9.3 six 1.16.0 stack-data 0.2.0 statsmodels 0.13.2 svgwrite 1.4.1 tcrdist3 0.2.2 tcrsampler 0.1.9 terminado 0.13.2 testpath 0.6.0 tornado 6.1 traitlets 5.1.1 wcwidth 0.2.5 webencodings 0.5.1 wheel 0.37.1 widgetsnbextension 3.5.2 zipdist 0.1.5 zipp 3.7.0

kmayerb commented 2 years ago

Thanks for letting us know. A couple of issues seem to have come up relative to from .cfisher import *, and may relate to the C compiler used by your environment. For instance, a previous user (see issue #66) had success installing tcrdist3 in a new env: https://github.com/kmayerb/tcrdist3/issues/66

kmayerb commented 2 years ago

Could you provide additional information about the OS , C-compiler, and python environment you are using?

psimps21 commented 2 years ago

I am on macOS Mojave 10.14.6. I made a virtual environment with python 3.8.5 using virtualenv and my C-compiler is Clang 6.0 (clang-600.0.57)

kmayerb commented 2 years ago

Ok trying to reproduced your error message. I also have a version on macOS Mojave 10.14.6, Python 3.8.12. Clang 10.0.0 .

My installation:

Python 3.8.12 (default, Oct 12 2021, 06:23:56)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import hierdiff as hd

Worked fine. However, when I tried the following I was able to recreate the problem:

conda create -n t3test python=3.8.12
conda acitvate t3test
pip install tcrdist3==0.2.2
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tcrdist
>>> import hierdiff
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kmayerbl/anaconda3/envs/t3test/lib/python3.8/site-packages/hierdiff/__init__.py", line 3, in <module>
    from .association_testing import cluster_association_test
  File "/Users/kmayerbl/anaconda3/envs/t3test/lib/python3.8/site-packages/hierdiff/association_testing.py", line 15, in <module>
    from fishersapi import fishers_vec, fishers_frame, adjustnonnan
  File "/Users/kmayerbl/anaconda3/envs/t3test/lib/python3.8/site-packages/fishersapi/__init__.py", line 3, in <module>
    from .fishersapi import *
  File "/Users/kmayerbl/anaconda3/envs/t3test/lib/python3.8/site-packages/fishersapi/fishersapi.py", line 66, in <module>
    import fisher
  File "/Users/kmayerbl/anaconda3/envs/t3test/lib/python3.8/site-**packages/fisher/__init__.py", line 3, in <module>
    from .cfisher import *
  File "src/cfisher.pyx", line 1, in init fisher.cfisher**
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
>>>

Reaching out to some others on the team. We'll get back to you.

kmayerb commented 2 years ago

tcrdist3 uses Numba for speed which currently requires numpy < 1.22 (see https://numba.readthedocs.io/en/stable/user/installing.html), but lower version of numpy will not work in Python 3.10, so be sure you don't use a python version > 3.9.10:

Ok on a Mac I installed tcrdist3 fresh as follows:

  1. Download Python Python 3.9.10 - Jan. 14, 2022
  2. Install Python
  3. Create venv
python3 -m venv ./tcr39
source tcr39/bin/activate
python3 -m pip install --upgrade pip
pip3 install numpy==1.20.3
pip3 install fisher==0.1.10
pip3 install numba
pip3 install tcrdist3==0.2.2

Which now works on OSX macOS Mojave 10.14.6

Python 3.9.10 (v3.9.10:f2f3f53782, Jan 13 2022, 17:02:14)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import fisher
>>> import numba
>>> import tcrdist
>>> import pandas as pd
>>> from tcrdist.repertoire import TCRrep
>>>
>>> df = pd.read_csv("dash.csv")
>>> tr = TCRrep(cell_df = df,
...             organism = 'mouse',
...             chains = ['alpha','beta'],
...             db_file = 'alphabeta_gammadelta_db.tsv')

>>>
>>> import hierdiff
(tcr39) Aueiku-MBP:~ kmayerbl$ pip3 freeze
cycler==0.11.0
dill==0.3.4
feather-format==0.4.1
fisher==0.1.10
fishersapi==0.3
fonttools==4.29.1
hierdiff==0.8
Jinja2==3.0.3
joblib==1.1.0
kiwisolver==1.3.2
llvmlite==0.38.0
MarkupSafe==2.1.0
matplotlib==3.5.1
numba==0.55.1
numpy==1.20.3
olga==1.2.4
packaging==21.3
palmotif==0.4
pandas==1.4.1
parasail==1.2.4
parmap==1.5.3
patsy==0.5.2
Pillow==9.0.1
progress==1.6
pwseqdist==0.6
pyarrow==7.0.0
pyparsing==3.0.7
python-dateutil==2.8.2
pytz==2021.3
scipy==1.8.0
six==1.16.0
statsmodels==0.13.2
svgwrite==1.4.1
tcrdist3==0.2.2
tcrsampler==0.1.9
zipdist==0.1.5
psimps21 commented 2 years ago

With these installation instructions I am able to run the hierarchical neighborhoods code and produce the output file. Thank you!