open-webui / pipelines

Pipelines: Versatile, UI-Agnostic OpenAI-Compatible Plugin Framework
MIT License
1.01k stars 319 forks source link

Custom pipeline failing NLTK download #240

Closed CommodoreEU closed 2 months ago

CommodoreEU commented 3 months ago

I tried to install a custom pipeline with additional dependencies like described in the readme,

docker run -d -p 9099:9099 --add-host=host.docker.internal:host-gateway -e PIPELINES_URLS="https://github.com/open-webui/pipelines/blob/main/examples/pipelines/rag/llamaindex_ollama_pipeline.py " -v pipelines:/app/pipelines --name pipelines --restart always ghcr.io/open-webui/pipelines:main

However, this leads to the container falling startup being unable to download an NLTK dataset. Could this have something to do with this NLTK change? https://github.com/nltk/nltk/issues/3266#issuecomment-2284001819

2024-08-27 14:04:50 ERROR: Traceback (most recent call last): 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/corpus/util.py", line 84, in load 2024-08-27 14:04:50 root = nltk.data.find(f"{self.subdir}/{zip_name}") 2024-08-27 14:04:50 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/data.py", line 579, in find 2024-08-27 14:04:50 raise LookupError(resource_not_found) 2024-08-27 14:04:50 LookupError: 2024-08-27 14:04:50 ** 2024-08-27 14:04:50 Resource wordnet not found. 2024-08-27 14:04:50 Please use the NLTK Downloader to obtain the resource: 2024-08-27 14:04:50 2024-08-27 14:04:50 >>> import nltk 2024-08-27 14:04:50 >>> nltk.download('wordnet') 2024-08-27 14:04:50
2024-08-27 14:04:50 For more information see: https://www.nltk.org/data.html 2024-08-27 14:04:50 2024-08-27 14:04:50 Attempted to load corpora/wordnet.zip/wordnet/ 2024-08-27 14:04:50 2024-08-27 14:04:50 Searched in: 2024-08-27 14:04:50 - '/root/nltk_data' 2024-08-27 14:04:50 - '/usr/local/nltk_data' 2024-08-27 14:04:50 - '/usr/local/share/nltk_data' 2024-08-27 14:04:50 - '/usr/local/lib/nltk_data' 2024-08-27 14:04:50 - '/usr/share/nltk_data' 2024-08-27 14:04:50 - '/usr/local/share/nltk_data' 2024-08-27 14:04:50 - '/usr/lib/nltk_data' 2024-08-27 14:04:50 - '/usr/local/lib/nltk_data' 2024-08-27 14:04:50 ** 2024-08-27 14:04:50 2024-08-27 14:04:50 2024-08-27 14:04:50 During handling of the above exception, another exception occurred: 2024-08-27 14:04:50 2024-08-27 14:04:50 Traceback (most recent call last): 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 732, in lifespan 2024-08-27 14:04:50 async with self.lifespan_context(app) as maybe_state: 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/contextlib.py", line 210, in aenter 2024-08-27 14:04:50 return await anext(self.gen) 2024-08-27 14:04:50 ^^^^^^^^^^^^^^^^^^^^^ 2024-08-27 14:04:50 File "/app/main.py", line 245, in lifespan 2024-08-27 14:04:50 await on_startup() 2024-08-27 14:04:50 File "/app/main.py", line 224, in on_startup 2024-08-27 14:04:50 await module.on_startup() 2024-08-27 14:04:50 File "/app/./pipelines/llamaindex_ollama_pipeline.py", line 38, in on_startup 2024-08-27 14:04:50 from llama_index.embeddings.ollama import OllamaEmbedding 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/llama_index/embeddings/ollama/init.py", line 1, in 2024-08-27 14:04:50 from llama_index.embeddings.ollama.base import OllamaEmbedding 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/llama_index/embeddings/ollama/base.py", line 4, in 2024-08-27 14:04:50 from llama_index.core.base.embeddings.base import BaseEmbedding 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/llama_index/core/init.py", line 10, in 2024-08-27 14:04:50 from llama_index.core.base.response.schema import Response 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/llama_index/core/base/response/schema.py", line 9, in 2024-08-27 14:04:50 from llama_index.core.schema import NodeWithScore 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/llama_index/core/schema.py", line 18, in 2024-08-27 14:04:50 from llama_index.core.utils import SAMPLE_TEXT, truncate_text 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/llama_index/core/utils.py", line 89, in 2024-08-27 14:04:50 globals_helper = GlobalsHelper() 2024-08-27 14:04:50 ^^^^^^^^^^^^^^^ 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/llama_index/core/utils.py", line 45, in init 2024-08-27 14:04:50 import nltk 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/init.py", line 153, in 2024-08-27 14:04:50 from nltk.translate import * 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/translate/init.py", line 24, in 2024-08-27 14:04:50 from nltk.translate.meteor_score import meteor_score as meteor 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/translate/meteor_score.py", line 14, in 2024-08-27 14:04:50 from nltk.stem.api import StemmerI 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/stem/init.py", line 34, in 2024-08-27 14:04:50 from nltk.stem.wordnet import WordNetLemmatizer 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 13, in 2024-08-27 14:04:50 class WordNetLemmatizer: 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 48, in WordNetLemmatizer 2024-08-27 14:04:50 morphy = wn.morphy 2024-08-27 14:04:50 ^^^^^^^^^ 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/corpus/util.py", line 120, in getattr 2024-08-27 14:04:50 self.load() 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/corpus/util.py", line 86, in load 2024-08-27 14:04:50 raise e 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/corpus/util.py", line 81, in load 2024-08-27 14:04:50 root = nltk.data.find(f"{self.subdir}/{self.__name}") 2024-08-27 14:04:50 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-27 14:04:50 File "/usr/local/lib/python3.11/site-packages/nltk/data.py", line 579, in find 2024-08-27 14:04:50 raise LookupError(resource_not_found) 2024-08-27 14:04:50 LookupError: 2024-08-27 14:04:50 ** 2024-08-27 14:04:50 Resource wordnet not found. 2024-08-27 14:04:50 Please use the NLTK Downloader to obtain the resource: 2024-08-27 14:04:50 2024-08-27 14:04:50 >>> import nltk 2024-08-27 14:04:50 >>> nltk.download('wordnet') 2024-08-27 14:04:50
2024-08-27 14:04:50 For more information see: https://www.nltk.org/data.html 2024-08-27 14:04:50 Installing requirement: llama-index 2024-08-27 14:04:50 Installing requirement: llama-index-llms-ollama 2024-08-27 14:04:50 Installing requirement: llama-index-embeddings-ollama 2024-08-27 14:04:50 Loaded module: llamaindex_ollama_pipeline 2024-08-27 14:04:50 2024-08-27 14:04:50 Attempted to load corpora/wordnet 2024-08-27 14:04:50 2024-08-27 14:04:50 Searched in: 2024-08-27 14:04:50 - '/root/nltk_data' 2024-08-27 14:04:50 - '/usr/local/nltk_data' 2024-08-27 14:04:50 - '/usr/local/share/nltk_data' 2024-08-27 14:04:50 - '/usr/local/lib/nltk_data' 2024-08-27 14:04:50 - '/usr/share/nltk_data' 2024-08-27 14:04:50 - '/usr/local/share/nltk_data' 2024-08-27 14:04:50 - '/usr/lib/nltk_data' 2024-08-27 14:04:50 - '/usr/local/lib/nltk_data' 2024-08-27 14:04:50 ** 2024-08-27 14:04:50 2024-08-27 14:04:50 2024-08-27 14:04:50 ERROR: Application startup failed. Exiting.

qzcl-maintainer commented 2 months ago

Were you able to resolve this? I am getting the same error.

qzcl-maintainer commented 2 months ago

This is what i did to resolve the issue. My environment is docker so I first connected to my docker image running pipelines.

docker container exec -it pipeline-01 bash
pip install --user -U numpy
pip install --user -U nltk

python
import nltk
nltk.download('wordnet')
nltk.download('all-nltk') #Downloads all packages.  This step is optional

Let me know if it works and mark as resolved if it does!

CommodoreEU commented 2 months ago

Thank you. This has managed to solve my issue; everything works beautifully now!