monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
603 stars 76 forks source link

gpt4all issue with running docker image #298

Closed Vishal-Joshi closed 10 months ago

Vishal-Joshi commented 10 months ago

Hi, I am trying to build a docker image having installation of ontogpt in order to give it platform independence of running because as such I see it can not run on our HPC nodes or other Linux distributions due to GC lib constraints.

I have simplified my docker to reproduce this issue and it looks like this

FROM ubuntu:22.04

RUN apt update
RUN apt install python3 python3-pip -y
RUN pip3 install ontogpt

RUN python3 --version

#COPY run_ontogpt.sh /opt/scripts/bin/
RUN pip3 list

#RUN mkdir /volume
#VOLUME /volume

CMD ontogpt --help

I am using ubuntu base image of v22:04. Ontogpt v0.3.5 gets installed which is evident from docker build output

#8 [5/6] RUN python3 --version
#8 0.339 Python 3.10.12
#8 DONE 0.3s

#9 [6/6] RUN pip3 list
#9 0.772 Package                    Version
#9 0.772 -------------------------- ------------
#9 0.772 adeft                      0.11.2
#9 0.772 aiohttp                    3.9.1
#9 0.773 aiosignal                  1.3.1
#9 0.773 airium                     0.2.6
#9 0.773 altair                     5.2.0
#9 0.774 aniso8601                  9.0.1
#9 0.774 annotated-types            0.6.0
#9 0.774 antlr4-python3-runtime     4.9.3
#9 0.778 anyio                      4.2.0
#9 0.778 appdirs                    1.4.4
#9 0.778 arrow                      1.3.0
#9 0.778 async-timeout              4.0.3
#9 0.778 attrs                      23.1.0
#9 0.778 Babel                      2.14.0
#9 0.778 bcp47                      0.0.4
#9 0.778 beautifulsoup4             4.12.2
#9 0.778 bioc                       2.1
#9 0.778 blinker                    1.7.0
#9 0.778 boto3                      1.34.4
#9 0.778 botocore                   1.34.4
#9 0.778 cachetools                 5.3.2
#9 0.778 cachier                    2.2.2
#9 0.778 cattrs                     23.2.3
#9 0.778 certifi                    2023.11.17
#9 0.778 CFGraph                    0.2.1
#9 0.778 chardet                    5.2.0
#9 0.778 charset-normalizer         3.3.2
#9 0.778 class-resolver             0.4.2
#9 0.778 click                      8.1.7
#9 0.778 click-default-group        1.2.4
#9 0.778 click-default-group-wheel  1.2.3
#9 0.778 colorama                   0.4.6
#9 0.778 curies                     0.7.4
#9 0.778 Deprecated                 1.2.14
#9 0.778 deprecation                2.1.0
#9 0.778 distlib                    0.3.8
#9 0.778 docopt                     0.6.2
#9 0.778 dominate                   2.9.0
#9 0.778 EditorConfig               0.12.3
#9 0.778 et-xmlfile                 1.1.0
#9 0.778 eutils                     0.6.0
#9 0.778 exceptiongroup             1.2.0
#9 0.778 fastobo                    0.12.3
#9 0.778 filelock                   3.13.1
#9 0.778 Flask                      2.1.3
#9 0.778 Flask-Bootstrap            3.3.7.1
#9 0.778 flask-restx                1.3.0
#9 0.778 Flask-WTF                  1.2.1
#9 0.778 fqdn                       1.5.1
#9 0.778 frozenlist                 1.4.1
#9 0.778 funowl                     0.2.3
#9 0.778 ghp-import                 2.1.0
#9 0.778 gilda                      1.0.0
#9 0.778 gitdb                      4.0.11
#9 0.778 GitPython                  3.1.40
#9 0.778 gpt4                       0.0.1
#9 0.778 gpt4all                    0.1.7
#9 0.778 graphviz                   0.20.1
#9 0.778 greenlet                   3.0.2
#9 0.778 h11                        0.14.0
#9 0.778 hbreader                   0.9.1
#9 0.778 httpcore                   1.0.2
#9 0.778 httpx                      0.25.2
#9 0.778 idna                       3.6
#9 0.779 ijson                      3.2.3
#9 0.779 importlib-metadata         6.11.0
#9 0.779 importlib-resources        6.1.1
#9 0.779 inflect                    7.0.0
#9 0.780 inflection                 0.5.1
#9 0.780 iniconfig                  2.0.0
#9 0.780 intervaltree               3.1.0
#9 0.780 isodate                    0.6.1
#9 0.781 isoduration                20.11.0
#9 0.781 itsdangerous               2.1.2
#9 0.781 Jinja2                     3.1.2
#9 0.782 jmespath                   1.0.1
#9 0.782 joblib                     1.3.2
#9 0.782 jsbeautifier               1.14.11
#9 0.782 json-flattener             0.1.9
#9 0.782 jsonasobj                  1.3.1
#9 0.783 jsonasobj2                 1.0.4
#9 0.783 jsonlines                  4.0.0
#9 0.783 jsonpatch                  1.33
#9 0.783 jsonpath-ng                1.6.0
#9 0.783 jsonpointer                2.4
#9 0.784 jsonschema                 4.20.0
#9 0.784 jsonschema-specifications  2023.11.2
#9 0.784 kgcl-rdflib                0.5.0
#9 0.784 kgcl_schema                0.6.2
#9 0.784 lark                       1.1.8
#9 0.785 linkml                     1.6.6
#9 0.785 linkml-dataops             0.1.0
#9 0.785 linkml-owl                 0.3.0
#9 0.785 linkml-renderer            0.3.0
#9 0.785 linkml-runtime             1.6.3
#9 0.786 llm                        0.8.1
#9 0.786 llm-gpt4all                0.1.1
#9 0.786 lxml                       4.9.4
#9 0.786 Markdown                   3.5.1
#9 0.787 markdown-it-py             3.0.0
#9 0.787 MarkupSafe                 2.1.3
#9 0.787 mdurl                      0.1.2
#9 0.787 mergedeep                  1.3.4
#9 0.787 mkdocs                     1.5.3
#9 0.788 mkdocs-material            9.5.2
#9 0.788 mkdocs-material-extensions 1.3.1
#9 0.788 mkdocs-mermaid2-plugin     0.6.0
#9 0.788 more-click                 0.1.2
#9 0.788 multidict                  6.0.4
#9 0.789 ndex2                      3.6.0
#9 0.789 networkx                   3.2.1
#9 0.789 nlpcloud                   1.1.45
#9 0.789 nltk                       3.8.1
#9 0.790 numpy                      1.26.2
#9 0.790 oaklib                     0.5.24
#9 0.790 ols-client                 0.1.4
#9 0.790 ontogpt                    0.3.5
#9 0.790 ontoportal-client          0.0.4
#9 0.791 openai                     0.28.1
#9 0.791 openpyxl                   3.1.2
#9 0.791 packaging                  23.2
#9 0.792 paginate                   0.5.6
#9 0.792 pandas                     2.1.4
#9 0.793 pansql                     0.0.1
#9 0.793 parse                      1.20.0
#9 0.793 pathspec                   0.12.1
#9 0.794 Pillow                     10.1.0
#9 0.794 pip                        22.0.2
#9 0.794 platformdirs               4.1.0
#9 0.794 pluggy                     1.3.0
#9 0.795 ply                        3.11
#9 0.795 portalocker                2.8.2
#9 0.795 prefixcommons              0.1.12
#9 0.795 prefixmaps                 0.2.1
#9 0.795 pronto                     2.5.5
#9 0.796 protobuf                   4.25.1
#9 0.796 pyarrow                    14.0.2
#9 0.796 pydantic                   2.5.2
#9 0.796 pydantic_core              2.14.5
#9 0.796 pydeck                     0.8.1b0
#9 0.797 Pygments                   2.17.2
#9 0.797 PyJSG                      0.11.10
#9 0.797 pymdown-extensions         10.5
#9 0.797 pyparsing                  3.1.1
#9 0.797 pyproject-api              1.6.1
#9 0.797 PyShEx                     0.8.1
#9 0.798 PyShExC                    0.9.1
#9 0.798 pysolr                     3.9.0
#9 0.799 pystow                     0.5.2
#9 0.799 pytest                     7.3.1
#9 0.799 pytest-logging             2015.11.4
#9 0.799 python-dateutil            2.8.2
#9 0.799 python-multipart           0.0.5
#9 0.800 python-ulid                2.2.0
#9 0.800 PyTrie                     0.4.0
#9 0.800 pytz                       2023.3.post1
#9 0.800 PyYAML                     6.0.1
#9 0.800 pyyaml_env_tag             0.1
#9 0.800 ratelimit                  2.2.1
#9 0.800 rdflib                     7.0.0
#9 0.800 rdflib-jsonld              0.6.1
#9 0.801 rdflib-shim                1.0.3
#9 0.801 referencing                0.32.0
#9 0.801 regex                      2023.10.3
#9 0.801 requests                   2.31.0
#9 0.801 requests-cache             1.1.1
#9 0.802 requests-toolbelt          1.0.0
#9 0.802 rfc3339-validator          0.1.4
#9 0.802 rfc3987                    1.3.8
#9 0.802 rich                       13.7.0
#9 0.802 rpds-py                    0.15.2
#9 0.802 ruamel.yaml                0.18.5
#9 0.802 ruamel.yaml.clib           0.2.8
#9 0.802 s3transfer                 0.9.0
#9 0.803 scikit-learn               1.3.2
#9 0.803 scipy                      1.11.4
#9 0.803 semsimian                  0.2.11
#9 0.804 semsql                     0.3.3
#9 0.804 setuptools                 69.0.2
#9 0.804 ShExJSG                    0.8.2
#9 0.804 six                        1.16.0
#9 0.805 smmap                      5.0.1
#9 0.805 sniffio                    1.3.0
#9 0.805 sortedcontainers           2.4.0
#9 0.806 soupsieve                  2.5
#9 0.806 sparqlslurper              0.5.1
#9 0.806 SPARQLWrapper              2.0.0
#9 0.806 SQLAlchemy                 2.0.23
#9 0.807 SQLAlchemy-Utils           0.38.3
#9 0.807 sqlite-fts4                1.0.3
#9 0.807 sqlite-utils               3.36
#9 0.807 sssom                      0.4.2
#9 0.808 sssom-schema               0.15.0
#9 0.808 streamlit                  1.29.0
#9 0.808 tabulate                   0.9.0
#9 0.808 tenacity                   8.2.3
#9 0.808 threadpoolctl              3.2.0
#9 0.809 tiktoken                   0.5.2
#9 0.809 toml                       0.10.2
#9 0.809 tomli                      2.0.1
#9 0.810 toolz                      0.12.0
#9 0.810 tornado                    6.4
#9 0.810 tox                        4.11.4
#9 0.810 tqdm                       4.66.1
#9 0.810 types-python-dateutil      2.8.19.14
#9 0.810 typing_extensions          4.9.0
#9 0.811 tzdata                     2023.3
#9 0.811 tzlocal                    5.2
#9 0.811 Unidecode                  1.3.7
#9 0.811 uri-template               1.3.0
#9 0.811 url-normalize              1.4.3
#9 0.812 urllib3                    2.0.7
#9 0.812 validators                 0.22.0
#9 0.812 virtualenv                 20.25.0
#9 0.812 visitor                    0.1.3
#9 0.813 watchdog                   3.0.0
#9 0.813 webcolors                  1.13
#9 0.813 Werkzeug                   2.1.2
#9 0.813 wheel                      0.37.1
#9 0.813 wikipedia                  1.4.0
#9 0.814 Wikipedia-API              0.6.0
#9 0.814 wrapt                      1.16.0
#9 0.814 WTForms                    3.1.1
#9 0.814 yarl                       1.9.4
#9 0.814 zipp                       3.17.0
#9 DONE 0.9s

However, when I run a container with this image, it fails with this exception,

Traceback (most recent call last):
  File "/usr/local/bin/ontogpt", line 5, in <module>
    from ontogpt.cli import main
  File "/usr/local/lib/python3.10/dist-packages/ontogpt/cli.py", line 34, in <module>
    from ontogpt.engines.gpt4all_engine import GPT4AllEngine  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/ontogpt/engines/gpt4all_engine.py", line 34, in <module>
    from ontogpt.utils.gpt4all_runner import chain_gpt4all_model, set_up_gpt4all_model
  File "/usr/local/lib/python3.10/dist-packages/ontogpt/utils/gpt4all_runner.py", line 5, in <module>
    import llm
  File "/usr/local/lib/python3.10/dist-packages/llm/__init__.py", line 15, in <module>
    from .plugins import pm
  File "/usr/local/lib/python3.10/dist-packages/llm/plugins.py", line 13, in <module>
    pm.load_setuptools_entrypoints("llm")
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 398, in load_setuptools_entrypoints
    plugin = ep.load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.10/dist-packages/llm_gpt4all.py", line 1, in <module>
    from gpt4all import GPT4All as _GPT4All
  File "/usr/local/lib/python3.10/dist-packages/gpt4all/__init__.py", line 1, in <module>
    from . import gpt4all # noqa
  File "/usr/local/lib/python3.10/dist-packages/gpt4all/gpt4all.py", line 6, in <module>
    from . import pyllmodel
  File "/usr/local/lib/python3.10/dist-packages/gpt4all/pyllmodel.py", line 39, in <module>
    llmodel, llama = load_llmodel_library()
  File "/usr/local/lib/python3.10/dist-packages/gpt4all/pyllmodel.py", line 32, in load_llmodel_library
    llama_lib = ctypes.CDLL(llama_dir, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllama.so: cannot open shared object file: No such file or directory

I tried to premptively install, gpt4all of v1.0.8 as mentioned in your poetry.lock file but the docker image is not able to fetch it

 > [5/7] RUN pip3 install gpt4all==1.0.8:
0.522 ERROR: Could not find a version that satisfies the requirement gpt4all==1.0.8 (from versions: 0.1.5, 0.1.6, 0.1.7)
0.522 ERROR: No matching distribution found for gpt4all==1.0.8
------

Any help would be lot appreciated, thanks!

caufieldjh commented 10 months ago

Hi @Vishal-Joshi - that gpt4all package is really becoming a problem. Looks like it's time to make it an optional dependency. That package makes a lot of assumptions about what its run environment will look like and that creates problems in cloud instances, Docker containers, etc.

Will update with a solution soon.

caufieldjh commented 10 months ago

OK, please try the new release (0.3.6) - this moves gpt4all dependencies to optional, so the rest should install without issue. Please let me know if it works!

Vishal-Joshi commented 10 months ago

This issue is fixed for me. Thanks @caufieldjh I have got another issue which I will raise separately.

Feel free to close this issue