informagi / REL

REL: Radboud Entity Linker
https://rel.readthedocs.io/
MIT License
305 stars 68 forks source link

Sklearn and Numpy Dependencies when installing REL from source (Option 3) #159

Open leventidis opened 1 year ago

leventidis commented 1 year ago

I am trying to run a simple local server by installing REL from source (Option 3) (Following the tutorial from: https://rel.readthedocs.io/en/latest/tutorials/e2e_entity_linking/)

I have also downloaded the generic and wiki_2014 corpus that are linked in github

However, I am running into package versioning issues. For instance just running EntityDisambiguation() I get the following:

model = EntityDisambiguation(base_url, wiki_version, config)
File "/home/aristotelis/Documents/REL/env/lib/python3.8/site-packages/REL/entity_disambiguation.py", line 67, in __init__
    self.model_lr = pkl.load(f)
ModuleNotFoundError: No module named 'sklearn.linear_model.logistic'

I tried downgrading scikit-learn (e.g., 0.22.0) but then I get numpy float depreciation errors. The requirements.txt doesn't specify versions for these packages

What versions of scikit-learn and numpy should be used to correctly load the models provided? Are there any other specific version of packages needed in order to successfully run the example in https://rel.readthedocs.io/en/latest/tutorials/e2e_entity_linking/

KDercksen commented 1 year ago

Hi there,

Our API server has the following versions installed:

scikit-learn==0.22.2
numpy==1.19.1
torch==1.7.0
flair==0.11.3

That should be everything, let us know if you run into further problems!

leventidis commented 1 year ago

Thank you for the help! Unfortunately, I am still not able to run the server locally.

I deleted my environment and started a fresh one. I first installed REL using pip install git+https://github.com/informagi/REL

I noticed that would install different versions from the ones you specified (it specifically installed scikit-learn-1.2.2, numpy-1.24.3, torch-2.0.1, flair-0.12.2)

So I manually uninstalled those 4 packages re-run pip install for them with the specified versions.

Running a pip freeze on my environment I currently have the following packages:

accelerate==0.19.0
aiohttp==3.8.4
aiosignal==1.3.1
anyascii==0.3.2
anyio==3.6.2
async-timeout==4.0.2
attrs==23.1.0
beautifulsoup4==4.12.2
blis==0.7.9
boto3==1.26.137
botocore==1.29.137
bpemb==0.3.4
catalogue==2.0.8
certifi==2023.5.7
charset-normalizer==3.1.0
click==8.1.3
cloudpickle==2.2.1
cmake==3.26.3
colorama==0.4.6
confection==0.0.4
conllu==4.5.2
contourpy==1.0.7
cycler==0.11.0
cymem==2.0.7
dataclasses==0.6
datasets==2.12.0
Deprecated==1.2.13
dill==0.3.6
fastapi==0.95.2
filelock==3.12.0
flair==0.11.3
fonttools==4.39.4
frozenlist==1.3.3
fsspec==2023.5.0
ftfy==6.1.1
future==0.18.3
gdown==4.4.0
gensim==4.3.1
h11==0.14.0
huggingface-hub==0.14.1
hyperopt==0.2.7
idna==3.4
importlib-metadata==3.10.1
importlib-resources==5.12.0
Janome==0.4.2
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
kiwisolver==1.4.4
konoha==4.6.5
langcodes==3.3.0
langdetect==1.0.9
lit==16.0.5
lxml==4.9.2
MarkupSafe==2.1.2
matplotlib==3.7.1
more-itertools==9.1.0
mpld3==0.3
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.14
murmurhash==1.0.9
networkx==3.1
nltk==3.8.1
numpy==1.19.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
overrides==3.1.0
packaging==23.1
pandas==2.0.1
pathy==0.10.1
Pillow==9.5.0
pptree==3.1
preshed==3.0.8
protobuf==3.20.2
psutil==5.9.5
py4j==0.10.9.7
pyarrow==12.0.0
pydantic==1.10.7
pyparsing==3.0.9
PySocks==1.7.1
python-dateutil==2.8.2
pytorch_revgrad==0.2.0
pytz==2023.3
PyYAML==6.0
radboud-el @ git+https://github.com/informagi/REL@a61bfc02d7aa713b470f0ed3f83af1c6c72eef8c
regex==2023.5.5
requests==2.30.0
responses==0.18.0
s3transfer==0.6.1
scikit-learn==0.22.2
scipy==1.10.1
segtok==1.5.11
sentencepiece==0.1.95
six==1.16.0
smart-open==6.3.0
sniffio==1.3.0
soupsieve==2.4.1
spacy==3.5.3
spacy-legacy==3.0.12
spacy-loggers==1.0.4
sqlitedict==2.1.0
srsly==2.4.6
starlette==0.27.0
sympy==1.12
tabulate==0.9.0
thinc==8.1.10
threadpoolctl==3.1.0
tokenizers==0.13.3
torch==1.7.0
tqdm==4.65.0
transformer-smaller-training-vocab==0.2.3
transformers==4.29.2
triton==2.0.0
typer==0.7.0
typing_extensions==4.5.0
tzdata==2023.3
urllib3==1.26.15
uvicorn==0.22.0
wasabi==1.1.1
wcwidth==0.2.6
Wikipedia-API==0.5.8
wrapt==1.15.0
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0

Trying to run the server I am now getting the following stack trace error:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    from REL.entity_disambiguation import EntityDisambiguation
  File "/home/aristotelis/Documents/REL/env/lib/python3.8/site-packages/REL/entity_disambiguation.py", line 16, in <module>
    from sklearn.linear_model import LogisticRegression
  File "/home/aristotelis/Documents/REL/env/lib/python3.8/site-packages/sklearn/linear_model/__init__.py", line 12, in <module>
    from ._least_angle import (Lars, LassoLars, lars_path, lars_path_gram, LarsCV,
  File "/home/aristotelis/Documents/REL/env/lib/python3.8/site-packages/sklearn/linear_model/_least_angle.py", line 30, in <module>
    method='lar', copy_X=True, eps=np.finfo(np.float).eps,
  File "/home/aristotelis/Documents/REL/env/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

On another note I noticed that the server.py file has no mention of the make_handler() function which is referenced in the tutorial in the from REL.server import make_handler statement. I found a make_handler() function at https://github.com/informagi/REL/blob/a61bfc02d7aa713b470f0ed3f83af1c6c72eef8c/scripts/comparison_BLINK/run_server.py#L86 and used that in the server.py but I am not sure if that's correct or the tutorial for setting up the server at: https://rel.readthedocs.io/en/latest/tutorials/e2e_entity_linking/ is outdated

Thanks!