rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
https://rom1504.github.io/clip-retrieval/
MIT License
2.36k stars 208 forks source link

Cannot import HTTPAdapter from requests.adapters when importing clip_retrieval on WSL2 Ubuntu 20.04 #204

Closed Twenkid closed 8 months ago

Twenkid commented 1 year ago

Any tips for resolving this error? As far as I checked all dependencies are installed and are within the ranges (see below), also I tried uninstall-install for the suspect problematic libs (flask, requests). I didn't use venv so far though if that could be a problem with some of the libs, flask in particular which breaks during importing) and I don't know if this particular Py version could be a problem.

On Windows, where I first tired, the error on WSL2 did not happen and manual importing flask and HTTPAdapter run, i.e. from flask import Flask and from requests.adapters import HTTPAdapter return fine. In WSL, >>> import requests returns, but from requests.adapters import HTTPAdapter is erroneous as in the clip_retrieval error.

(On Win the final obstacle were (perhaps as far as I discovered) BLAS and OpenMP missing binaries/proper installation/connection/build ( from . import _swigfaiss ...). I tried to fix it for a while, even manually copying .dlls (libblas.dll, libiomp5md.dll, liblapack.dll"), but the error didn't change, then I gave up and tried WSL.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/clip_retrieval/__init__.py", line 3, in <module>
    from .clip_back import clip_back
  File "/usr/local/lib/python3.8/dist-packages/clip_retrieval/clip_back.py", line 5, in <module>
    from flask import Flask, request, make_response
ImportError: cannot import name 'Flask' from 'flask' (/home/tosh/.local/lib/python3.8/site-packages/flask/__init__.py)
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 72, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 32, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 12, in <module>
    import os, glob, subprocess, os.path, time, pwd, sys, requests_unixsocket
  File "/usr/lib/python3/dist-packages/requests_unixsocket/__init__.py", line 4, in <module>
    from .adapters import UnixAdapter
  File "/usr/lib/python3/dist-packages/requests_unixsocket/adapters.py", line 3, in <module>
    from requests.adapters import HTTPAdapter
ImportError: cannot import name 'HTTPAdapter' from 'requests.adapters' (/home/tosh/.local/lib/python3.8/site-packages/requests/adapters.py)

Original exception was:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/clip_retrieval/__init__.py", line 3, in <module>
    from .clip_back import clip_back
  File "/usr/local/lib/python3.8/dist-packages/clip_retrieval/clip_back.py", line 5, in <module>
    from flask import Flask, request, make_response
ImportError: cannot import name 'Flask' from 'flask' (/home/tosh/.local/lib/python3.8/site-packages/flask/__init__.py)
>>>
 pip3 list
Package                  Version
------------------------ --------------------
aiohttp                  3.8.3
aiosignal                1.2.0
albumentations           1.3.0
aniso8601                9.0.1
async-timeout            4.0.2
attrs                    19.3.0
autofaiss                2.15.3
Automat                  0.8.0
blinker                  1.4
braceexpand              0.1.7
certifi                  2019.11.28
chardet                  3.0.4
charset-normalizer       2.1.1
click                    8.1.3
clip-anytorch            2.5.0
clip-retrieval           2.35.1
cloud-init               22.3.4
colorama                 0.4.3
command-not-found        0.3
configobj                5.0.6
constantly               15.1.0
cryptography             2.8
cupshelpers              1.0
cycler                   0.11.0
dataclasses              0.6
dbus-python              1.2.16
defer                    1.0.6
Distance                 0.1.3
distro                   1.4.0
distro-info              0.23ubuntu1
docker-pycreds           0.4.0
embedding-reader         1.5.0
entrypoints              0.3
ExifRead-nocycle         3.0.1
faiss-cpu                1.7.2
filelock                 3.8.0
fire                     0.4.0
Flask                    2.2.2
Flask-Cors               3.0.10
Flask-RESTful            0.3.9
fonttools                4.28.5
frozenlist               1.3.1
fsspec                   2022.1.0
ftfy                     6.1.1
gitdb                    4.0.9
GitPython                3.1.29
h5py                     3.7.0
httplib2                 0.14.0
huggingface-hub          0.10.1
hyperlink                19.0.0
idna                     2.8
imageio                  2.22.4
img2dataset              1.33.0
importlib-metadata       5.0.0
incremental              16.10.1
itsdangerous             2.1.2
Jinja2                   3.1.2
joblib                   1.2.0
jsonpatch                1.22
jsonpointer              2.0
jsonschema               3.2.0
keyring                  18.0.1
kiwisolver               1.3.2
language-selector        0.1
launchpadlib             1.10.13
lazr.restfulclient       0.14.2
lazr.uri                 1.0.3
macaroonbakery           1.3.1
MarkupSafe               2.1.1
matplotlib               3.5.1
more-itertools           4.2.0
multidict                6.0.2
multilingual-clip        1.0.10
netifaces                0.10.4
networkx                 2.8.8
nltk                     3.7
numpy                    1.22.0
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
oauthlib                 3.1.0
onnx                     1.11.0
open-clip-torch          2.3.1
opencv-python            4.6.0.66
opencv-python-headless   4.6.0.66
packaging                21.3
pandas                   1.5.1
pathtools                0.1.2
pexpect                  4.6.0
Pillow                   9.0.0
pip                      20.0.2
prometheus-client        0.15.0
promise                  2.3
protobuf                 3.19.4
psutil                   5.5.1
pyarrow                  7.0.0
pyasn1                   0.4.2
pyasn1-modules           0.2.1
pybind11                 2.9.1
pycairo                  1.16.2
pycups                   1.9.73
PyGObject                3.36.0
PyHamcrest               1.9.0
PyJWT                    1.7.1
pymacaroons              0.13.0
PyNaCl                   1.3.0
pyOpenSSL                19.0.0
pyparsing                3.0.6
pyRFC3339                1.1
pyrsistent               0.15.5
pyserial                 3.4
python-apt               2.0.0+ubuntu0.20.4.8
python-dateutil          2.8.2
python-debian            0.1.36ubuntu1
pytz                     2022.6
PyWavelets               1.4.1
PyYAML                   5.3.1
qudida                   0.0.4
regex                    2022.10.31
requests                 2.28.1
requests-unixsocket      0.2.0
scikit-image             0.19.3
scikit-learn             1.1.3
scipy                    1.9.3
SecretStorage            2.3.1
sentence-transformers    2.2.2
sentencepiece            0.1.97
sentry-sdk               1.10.1
service-identity         18.1.0
setproctitle             1.3.2
setuptools               60.9.3
shortuuid                1.0.9
simplejson               3.16.0
six                      1.14.0
smmap                    5.0.0
sos                      4.4
ssh-import-id            5.10
systemd-python           234
termcolor                2.1.0
threadpoolctl            3.1.0
tifffile                 2022.10.10
tokenizers               0.13.2
torch                    1.13.0
torchvision              0.14.0
tqdm                     4.64.1
transformers             4.24.0
Twisted                  18.9.0
typing-extensions        4.1.1
ubuntu-advantage-tools   27.11.3
ufw                      0.36
unattended-upgrades      0.1
urllib3                  1.26.11
wadllib                  1.3.3
wandb                    0.12.21
wcwidth                  0.2.5
webdataset               0.1.103
Werkzeug                 2.2.2
wheel                    0.34.2
yarl                     1.8.1
zipp                     1.0.0
zope.interface           4.7.1

https://github.com/rom1504/clip-retrieval/blob/main/requirements.txt

img2dataset>=1.25.5,<2
clip-anytorch>=2.5.0,<3
tqdm>=4.62.3,<5
fire>=0.4.0,<0.5.0
torch>=1.7.1,<2
torchvision>=0.10.1,<2
numpy>=1.19.5,<2
faiss-cpu>=1.7.2,<2
flask>=2.0.3,<3
flask_restful>=0.3.9,<1
flask_cors>=3.0.10,<4
pandas>=1.1.5,<2
pyarrow>=6.0.1,<8
autofaiss>=2.9.6,<3
webdataset>=0.1.103,<0.2
h5py>=3.1.0,<4
prometheus-client>=0.13.1,<1
fsspec==2022.1.0
sentence-transformers>=2.2.0,<3
wandb>=0.12.10,<0.13
open-clip-torch>=2.0.0,<3.0.0
requests>=2.27.1,<3
aiohttp>=3.8.1,<4
multilingual-clip>=1.0.10,<2
Twenkid commented 1 year ago

I managed to fix it and run the whole clip-retrieval notepad example locally. The import now completes.

>>> import clip_retrieval
>>> dir(clip_retrieval)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'clip_back', 'clip_end2end', 'clip_filter', 'clip_front', 'clip_index', 'clip_inference', 'ivf_metadata_ordering', 'load_clip']

The solution: I moved to Python 3.10 (maybe not strictly needed, IDK) and upgraded pip. Then there was another error during trying to run clip-retrieval command, during importing _pustil, "partial import", which happened to be a conflict in the installation, maybe mixing sudo apt-get install and pip3 install / python3.10 -m pip install, because the error trace included references to two different directories.

  File "/usr/lib/python3/dist-packages/psutil/__init__.py", line 95, in <module>
    from . import _pslinux as _psplatform
  File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 26, in <module>
    from . import _psutil_linux as cext
ImportError: cannot import name '_psutil_linux' from partially initialized module 'psutil' (most likely due to a circular import) (/usr/lib/python3/dist-packages/psutil/__init__.py)

There were problems with removing and reinstalling the conflicting libraries, but manually deleting with sudo rm -r ... of the folders with the modules psutil and then reinstalling it with pip finally resolved it. (There were two, which added to the confusion, one with .egg-info sudo rm -r /usr/lib/python3/dist-packages/psutil-5.5.1 sudo rm -r /usr/lib/python3/dist-packages/psutil-5.5.1.egg-info

img2dataset --url_list=myimglist.txt --output_folder=image_folder --thread_count=4 --image_
size=256
Starting the downloading of this file
Sharding file number 1 of 1 called /mnt/z/myimglist.txt
0it [00:00, ?it/s]File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
0it [00:00, ?it/s]

myimglist.txt

https://placekitten.com/200/305 https://placekitten.com/200/304 https://placekitten.com/200/303 https://placekitten.com/200/302 https://placekitten.com/150/235 https://placekitten.com/120/120

When calling clip-retrieval, it downloads

 clip-retrieval inference  --input_dataset image_folder --output_folder embedding_folder
The number of samples has been estimated to be 6
Starting the worker
dataset is 12
Starting work on task 0
100%|███████████████████████████████████████| 354M/354M [01:35<00:00, 3.72MiB/s]
/home/xxxx/.local/lib/python3.10/site-packages/clip/clip.py:160: FutureWarning: 'torch.onnx._patch_torch._node_getitem' is deprecated in version 1.13 and will be removed in version 1.14. Please Internally use '_node_get' in symbolic_helper instead..
  if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):
/home/xxxx/.local/lib/python3.10/site-packages/clip/clip.py:186: FutureWarning: 'torch.onnx._patch_torch._node_getitem' is deprecated in version 1.13 and will be removed in version 1.14. Please Internally use '_node_get' in symbolic_helper instead..
  if inputs[i].node()["value"] == 5:
warming up with batch size 256 on cpu
done warming up in 56.720762968063354s
/home/tosh/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
 sample_per_sec 1 ; sample_count 6

...

Windows

If that info would be useful for anyone, I realized I was trying to import clip_retrieval in CLI (on WSL I was doing the same initially), while the example usage from the sample notebook is to call it as a command. img2dataset succeeds:

<frozen importlib._bootstrap>:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
Starting the downloading of this file
Sharding file number 1 of 1 called z:/list.txt
0it [00:00, ?it/s]File sharded in 1 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
<frozen importlib._bootstrap>:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
<frozen importlib._bootstrap>:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
1it [00:07,  7.66s/it]
worker  - success: 1.000 - failed to download: 0.000 - failed to resize: 0.000 - images per sec: 8 - count: 18
total   - success: 1.000 - failed to download: 0.000 - failed to resize: 0.000 - images per sec: 8 - count: 18

Calling clip-retrieval properly now returned another error: JAX requirement and it seems it has to be built from the source. So probably it may be just JAX, rather than OpenMP and BLAS, IDK yet.

clip-retrieval
<frozen importlib._bootstrap>:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
Traceback (most recent call last):
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\_src\lib\__init__.py", line 38, in <module>
    import jaxlib as jaxlib
ModuleNotFoundError: No module named 'jaxlib'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\Scripts\clip-retrieval.exe\__main__.py", line 4, in <module>
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\__init__.py", line 6, in <module>
    from .clip_inference.main import main as clip_inference
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\__init__.py", line 3, in <module>
    from .main import main as clip_inference
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\main.py", line 10, in <module>
    from clip_retrieval.clip_inference.distributor import PysparkDistributor, SequentialDistributor
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\distributor.py", line 5, in <module>
    from .worker import worker
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\worker.py", line 15, in <module>
    from clip_retrieval.clip_inference.mapper import ClipMapper
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\mapper.py", line 5, in <module>
    from sentence_transformers import SentenceTransformer
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\__init__.py", line 3, in <module>
    from .datasets import SentencesDataset, ParallelSentencesDataset
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\datasets\__init__.py", line 3, in <module>
    from .ParallelSentencesDataset import ParallelSentencesDataset
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\datasets\ParallelSentencesDataset.py", line 4, in <module>
    from .. import SentenceTransformer
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\SentenceTransformer.py", line 11, in <module>
    import transformers
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\__init__.py", line 30, in <module>
    from . import dependency_versions_check
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\dependency_versions_check.py", line 17, in <module>
    from .utils.versions import require_version, require_version_core
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\utils\__init__.py", line 34, in <module>
    from .generic import (
  File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\utils\generic.py", line 36, in <module>
    import jax.numpy as jnp
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\__init__.py", line 35, in <module>
    from jax import config as _config_module
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\config.py", line 17, in <module>
    from jax._src.config import config
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\_src\config.py", line 29, in <module>
    from jax._src import lib
  File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\_src\lib\__init__.py", line 40, in <module>
    raise ModuleNotFoundError(
ModuleNotFoundError: jax requires jaxlib to be installed. See https://github.com/google/jax#installation for installation instructions.
rom1504 commented 1 year ago

I suggest you use a python virtual environment

You are mentioning several libs that are not involved in clip retrieval (for example Jax), so it seems like your system libraries are conflicting with clip retrieval

Python should almost never be used outside of a virtual environment

On Thu, Nov 10, 2022, 00:28 Todor Arnaudov @.***> wrote:

I managed to fix it and run the whole clip-retrieval notepad example locally. The import now completes.

import clip_retrieval

dir(clip_retrieval)

['builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'clip_back', 'clip_end2end', 'clip_filter', 'clip_front', 'clip_index', 'clip_inference', 'ivf_metadata_ordering', 'load_clip']

The solution: I moved to Python 3.10 (maybe not strictly needed, IDK) and upgraded pip. Then there was another error during trying to run clip-retrieval command, during importing _pustil, "partial import", which happened to be a conflict in the installation, maybe mixing sudo apt-get install and pip3 install / python3.10 -m pip install, because the error trace included references to two different directories.

File "/usr/lib/python3/dist-packages/psutil/init.py", line 95, in

from . import _pslinux as _psplatform

File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 26, in

from . import _psutil_linux as cext

ImportError: cannot import name '_psutil_linux' from partially initialized module 'psutil' (most likely due to a circular import) (/usr/lib/python3/dist-packages/psutil/init.py)

There were problems with removing and reinstalling the conflicting libraries, but manually deleting with sudo rm -r ... of the folders with the modules psutil and then reinstalling it with pip finally resolved it. (There were two, which added to the confusion, one with .egg-info sudo rm -r /usr/lib/python3/dist-packages/psutil-5.5.1 sudo rm -r /usr/lib/python3/dist-packages/psutil-5.5.1.egg-info

img2dataset --url_list=myimglist.txt --output_folder=image_folder --threadcount=4 --image

size=256

Starting the downloading of this file

Sharding file number 1 of 1 called /mnt/z/myimglist.txt

0it [00:00, ?it/s]File sharded in 0 shards

Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!

0it [00:00, ?it/s]

myimglist.txt

https://placekitten.com/200/305 https://placekitten.com/200/304 https://placekitten.com/200/303 https://placekitten.com/200/302 https://placekitten.com/150/235 https://placekitten.com/120/120

When calling clip-retrieval, it downloads

clip-retrieval inference --input_dataset image_folder --output_folder embedding_folder

The number of samples has been estimated to be 6

Starting the worker

dataset is 12

Starting work on task 0

100%|███████████████████████████████████████| 354M/354M [01:35<00:00, 3.72MiB/s]

/home/xxxx/.local/lib/python3.10/site-packages/clip/clip.py:160: FutureWarning: 'torch.onnx._patch_torch._node_getitem' is deprecated in version 1.13 and will be removed in version 1.14. Please Internally use '_node_get' in symbolic_helper instead..

if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):

/home/xxxx/.local/lib/python3.10/site-packages/clip/clip.py:186: FutureWarning: 'torch.onnx._patch_torch._node_getitem' is deprecated in version 1.13 and will be removed in version 1.14. Please Internally use '_node_get' in symbolic_helper instead..

if inputs[i].node()["value"] == 5:

warming up with batch size 256 on cpu

done warming up in 56.720762968063354s

/home/tosh/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.

warnings.warn(_create_warning_msg(

sample_per_sec 1 ; sample_count 6

...

Windows

If that info would be useful for anyone, I realized I was trying to import clip_retrieval in CLI (on WSL I was doing the same initially), while the example usage from the sample notebook is to call it as a command. img2dataset succeeds:

:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject Starting the downloading of this file Sharding file number 1 of 1 called z:/list.txt 0it [00:00, ?it/s]File sharded in 1 shards Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)! :228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject :228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject 1it [00:07, 7.66s/it] worker - success: 1.000 - failed to download: 0.000 - failed to resize: 0.000 - images per sec: 8 - count: 18 total - success: 1.000 - failed to download: 0.000 - failed to resize: 0.000 - images per sec: 8 - count: 18 Calling clip-retrieval properly now returned another error: JAX requirement and it seems it has to be built from the source. So probably it may be just JAX, rather than OpenMP and BLAS, IDK yet. clip-retrieval :228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject Traceback (most recent call last): File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\_src\lib\__init__.py", line 38, in import jaxlib as jaxlib ModuleNotFoundError: No module named 'jaxlib' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\Scripts\clip-retrieval.exe\__main__.py", line 4, in File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\__init__.py", line 6, in from .clip_inference.main import main as clip_inference File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\__init__.py", line 3, in from .main import main as clip_inference File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\main.py", line 10, in from clip_retrieval.clip_inference.distributor import PysparkDistributor, SequentialDistributor File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\distributor.py", line 5, in from .worker import worker File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\worker.py", line 15, in from clip_retrieval.clip_inference.mapper import ClipMapper File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\clip_retrieval\clip_inference\mapper.py", line 5, in from sentence_transformers import SentenceTransformer File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\__init__.py", line 3, in from .datasets import SentencesDataset, ParallelSentencesDataset File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\datasets\__init__.py", line 3, in from .ParallelSentencesDataset import ParallelSentencesDataset File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\datasets\ParallelSentencesDataset.py", line 4, in from .. import SentenceTransformer File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\sentence_transformers\SentenceTransformer.py", line 11, in import transformers File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\__init__.py", line 30, in from . import dependency_versions_check File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\dependency_versions_check.py", line 17, in from .utils.versions import require_version, require_version_core File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\utils\__init__.py", line 34, in from .generic import ( File "C:\Users\toshb\AppData\Roaming\Python\Python39\site-packages\transformers\utils\generic.py", line 36, in import jax.numpy as jnp File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\__init__.py", line 35, in from jax import config as _config_module File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\config.py", line 17, in from jax._src.config import config File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\_src\config.py", line 29, in from jax._src import lib File "C:\Users\toshb\AppData\Local\Programs\Python\Python39\lib\site-packages\jax\_src\lib\__init__.py", line 40, in raise ModuleNotFoundError( ModuleNotFoundError: jax requires jaxlib to be installed. See https://github.com/google/jax#installation for installation instructions. — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
Twenkid commented 1 year ago

Thanks for the advice! Now I tried in Windows with Anaconda and now it imports from CLI and runs, but now it's facing another problem at the inference step. Calling it from the same current working dir and the same cmd line in WSL works, while in Windows: it starts and reports the initial state, creates the folder structure, but crashes with dataloader/pickle "runs out of data". (see the trace below) Both are Python 3.10, WSL is 3.10.8, Anaconda is 3.10.4.

I supposed it was errors with setting the folder paths or some missing/added = or quotes etc., also I considered one "hidden" problem with running on RAM disks (where I worked), e.g. Rust cannot compile correctly in the RAM disks I've tried in Windows (Imdisk, OSFMount), they seem not to support some required posix OS standards or something.

However I doubt it needs these functions here and from WSL it runs with the same RAM drive; I tried also on the system drive, reproducing the same error.

I looked in the sourced files, class ImageDataset(Dataset): init ... receives the right path from the cmd line.

Also if I call the script with intentionally wrong image_folder it returns appropriate error:

  File "C:\Users\toshb\.conda\envs\clip\lib\site-packages\clip_retrieval\clip_inference\reader.py", line 40, in folder_to_keys
    keys = list(sorted(keys))
TypeError: 'NoneType' object is not iterable

I tired to run it step by step from a Py CLI, but one obstacle I face (at the amount of efforts put so far) is that part of the initial code is embedded in an exe? C:\Users\xxxx.conda\envs\clip\Scripts\clip-retrieval.exe__main__.py"

I tried to run cli.py, but the best I get is the help info: C:\Users\toshb.conda\envs\clip\Lib\site-packages\clip_retrieval\cli.py

     COMMAND

COMMANDS
    COMMAND is one of the following:

     back
       main entry point of clip back, start the endpoints

     index
       indexes clip embeddings using autofaiss
       ...

I see Fire is a library to turn modules into CLI, I can't dig in deep right now.

The best I got so far is to peek into the invocation (no IDE right now):

  if input_format == "files":
        print(f"clip_retrieval\clip_inference\main.py: def calculate_partition_count calculate_partition_count(...)... if input_format==files:...")
        print(f"{input_dataset}\n{str(input_dataset)}\n{dir(input_dataset)}\n")
        ...

So far it seems paths to the files pass at least until clip_inference.reader.py.get_image_dataset

(clip) Z:\clipwin>clip-retrial inference --input_dataset image_folder --output_folder embeddings --batch_size 6 --num_prepro_workers=4
clip_retrieval\clip_inference\main.py: def calculate_partition_count calculate_partition_count(...)... if input_format==files:...
image_folder
image_folder
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

keys=[]
text_files={}
image_files={'000000000': WindowsPath('image_folder/00000/000000000.jpg'), '000000001': WindowsPath('image_folder/00000/000000001.jpg'), '000000002': WindowsPath('image_folder/00000/000000002.jpg'), '000000003': WindowsPath('image_folder/00000/000000003.jpg')}, metadata_files=None
The number of samples has been estimated to be 4
Starting the worker
input_dataset=image_folder
worker.py:worker:dataset is 12
Starting work on task 0
C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip\clip.py:160: FutureWarning: 'torch.onnx._patch_torch._node_getitem' is deprecated in version 1.13 and will be removed in version 1.14. Please Internally use '_node_get' in symbolic_helper instead..
  if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):
C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip\clip.py:186: FutureWarning: 'torch.onnx._patch_torch._node_getitem' is deprecated in version 1.13 and will be removed in version 1.14. Please Internally use '_node_get' in symbolic_helper instead..
  if inputs[i].node()["value"] == 5:
warming up with batch size 6 on cpu
done warming up in 2.655007839202881s
clip_inference.reader.py.get_image_dataset().ImageDataset.__init__: folder=image_folder
dataset = get_image_dataset()(preprocess, input_dataset, enable_text, enable_image, enable_metadata, sampler)
dataset_to_dataloader(dataset, batch_size, num_prepro_workers, "files" = <clip_retrieval.clip_inference.reader.get_image_dataset.<locals>.ImageDataset object at 0x0000024BB1D274C0>, 6, 4
Traceback (most recent call last):
  File "C:\Users\xxxx\.conda\envs\clip\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\xxxx\.conda\envs\clip\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\xxxx\.conda\envs\clip\Scripts\clip-retrieval.exe\__main__.py", line 7, in <module>
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip_retrieval\cli.py", line 18, in main
    fire.Fire(
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip_retrieval\clip_inference\main.py", line 157, in main
    distributor()
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip_retrieval\clip_inference\distributor.py", line 17, in __call__
    worker(
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip_retrieval\clip_inference\worker.py", line 123, in worker
    runner(task)
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip_retrieval\clip_inference\runner.py", line 39, in __call__
    batch = iterator.__next__()
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\clip_retrieval\clip_inference\reader.py", line 211, in __iter__
    for batch in self.dataloader:
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\torch\utils\data\dataloader.py", line 435, in __iter__
    return self._get_iterator()
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\torch\utils\data\dataloader.py", line 381, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\xxxx\.conda\envs\clip\lib\site-packages\torch\utils\data\dataloader.py", line 1034, in __init__
    w.start()
  File "C:\Users\xxxx\.conda\envs\clip\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\xxxx\.conda\envs\clip\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\xxxx\.conda\envs\clip\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\xxxx\.conda\envs\clip\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\xxxx\.conda\envs\clip\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_image_dataset.<locals>.ImageDataset'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\xxxx\.conda\envs\clip\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\xxxx\.conda\envs\clip\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

image

Re paths, I tried in Windows with different paths, initially with absolute ones, considered cases which sometimes happen with the need to add directory separator at the end of the name, .e.g folder\ and tried different --batch_size (even =1) and num_prepro_workers but the error stays the same.

rom1504 commented 8 months ago

please try again, and re open if needed