infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
17.57k stars 1.79k forks source link

[Bug]: Launch service from source, bash ./entrypoint.sh failed. #992

Closed simpleyin closed 3 months ago

simpleyin commented 3 months ago

Is there an existing issue for the same bug?

Branch name

main

Commit ID

daa479938581d91c44c0c93ea86789430f68fa9e

Other environment information

Current Repo: ragflow
Commit Id: daa4799
Operating system: centos 8 (Kernel version: 4.18.0-348.el8.x86_64)
CPU Type: x86_64
Memory: 125Gi
Docker Version: 26.1.3,
Python Version: 3.11.0

Actual behavior

followe the guide, after running the entrypoint.sh:

[WARNING] Load term.freq FAIL!
[WARNING] [2024-05-30 14:26:21,304] [synonym.__init__] [line:24]: Realtime synonym is disabled, since no redis connection.
[WARNING] [2024-05-30 14:26:21,590] [redis_conn.__open__] [line:44]: Redis can't be connected.
[WARNING] Load term.freq FAIL!
[WARNING] [2024-05-30 14:26:23,257] [synonym.__init__] [line:24]: Realtime synonym is disabled, since no redis connection.
Traceback (most recent call last):
  File "/root/rag/ragflow/api/ragflow_server.py", line 26, in <module>
    from api.apps import app
  File "/root/rag/ragflow/api/apps/__init__.py", line 92, in <module>
    client_urls_prefix = [
                         ^
  File "/root/rag/ragflow/api/apps/__init__.py", line 93, in <listcomp>
    register_page(path)
  File "/root/rag/ragflow/api/apps/__init__.py", line 78, in register_page
    spec.loader.exec_module(page)
  File "/root/rag/ragflow/api/apps/api_app.py", line 27, in <module>
    from api.db.services.dialog_service import DialogService, chat
  File "/root/rag/ragflow/api/db/services/dialog_service.py", line 23, in <module>
    from api.db.services.llm_service import LLMService, TenantLLMService, LLMBundle
  File "/root/rag/ragflow/api/db/services/llm_service.py", line 18, in <module>
    from rag.llm import EmbeddingModel, CvModel, ChatModel
  File "/root/rag/ragflow/rag/llm/__init__.py", line 17, in <module>
    from .chat_model import *
  File "/root/rag/ragflow/rag/llm/chat_model.py", line 22, in <module>
    from volcengine.maas.v2 import MaasService
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/volcengine/maas/__init__.py", line 1, in <module>
    from .MaasService import MaasService
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/volcengine/maas/MaasService.py", line 14, in <module>
    from .models.api.api_pb2 import ChatResp
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/volcengine/maas/models/api/api_pb2.py", line 16, in <module>
    from .. import base_pb2 as base__pb2
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/volcengine/maas/models/base_pb2.py", line 30, in <module>
    raw_body = _descriptor.FieldDescriptor(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/google/protobuf/descriptor.py", line 553, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Traceback (most recent call last):
  File "/root/rag/ragflow/rag/svr/task_executor.py", line 48, in <module>
    from rag.app import laws, paper, presentation, manual, qa, table, book, resume, picture, naive, one
  File "/root/rag/ragflow/rag/app/resume.py", line 23, in <module>
    from deepdoc.parser.resume import step_one, step_two
  File "/root/rag/ragflow/deepdoc/parser/resume/step_two.py", line 5, in <module>
    from deepdoc.parser.resume.entities import degrees, schools, corporations
  File "/root/rag/ragflow/deepdoc/parser/resume/entities/corporations.py", line 52, in <module>
    GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/rag/ragflow/deepdoc/parser/resume/entities/corporations.py", line 52, in <listcomp>
    GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/rag/ragflow/deepdoc/parser/resume/entities/corporations.py", line 32, in corpNorm
    tks = rag_tokenizer.tokenize(nm).split(" ")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/rag/ragflow/rag/nlp/rag_tokenizer.py", line 249, in tokenize
    return " ".join([self.stemmer.stem(self.lemmatizer.lemmatize(t)) for t in word_tokenize(line)])
                                                                              ^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 129, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
                      ^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/ragflow/lib/python3.11/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/root/nltk_data'
    - '/root/anaconda3/envs/ragflow/nltk_data'
    - '/root/anaconda3/envs/ragflow/share/nltk_data'
    - '/root/anaconda3/envs/ragflow/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

Expected behavior

ragflow api server running successfully.

Steps to reproduce

bash ./entrypoint.sh

Additional information

No response

simpleyin commented 3 months ago

it seems like my volcengine is 1.098, problem solved adter updated to lateset version.

Old-Lane commented 2 months ago

I upgraded volcengine but the problem was not solved

cuntoushifu commented 1 day ago

我也遇到了这样的问题

cuntoushifu commented 1 day ago

(rag-flow) andrew@node01:~/ragflow$ sudo bash entrypoint.sh Traceback (most recent call last): File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 84, in __load root = nltk.data.find(f"{self.subdir}/{zip_name}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError:


Resource wordnet not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet.zip/wordnet/

Searched in:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/andrew/ragflow/api/ragflow_server.py", line 26, in from api.apps import app File "/home/andrew/ragflow/api/apps/init.py", line 26, in from api.db.db_models import close_connection File "/home/andrew/ragflow/api/db/db_models.py", line 33, in from api.settings import DATABASE, stat_logger, SECRET_KEY, DATABASE_TYPE File "/home/andrew/ragflow/api/settings.py", line 36, in from rag.nlp import search File "/home/andrew/ragflow/rag/nlp/init.py", line 21, in from . import rag_tokenizer File "/home/andrew/ragflow/rag/nlp/rag_tokenizer.py", line 26, in from nltk import word_tokenize File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/init.py", line 153, in from nltk.translate import * File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/translate/init.py", line 24, in from nltk.translate.meteor_score import meteor_score as meteor File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/translate/meteor_score.py", line 14, in from nltk.stem.api import StemmerI File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/stem/init.py", line 34, in from nltk.stem.wordnet import WordNetLemmatizer File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 13, in class WordNetLemmatizer: File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 48, in WordNetLemmatizer morphy = wn.morphy ^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 120, in getattr self.load() File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 86, in load raise e File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 81, in load root = nltk.data.find(f"{self.subdir}/{self.name}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError:


Resource wordnet not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet

Searched in:

Traceback (most recent call last): File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 84, in __load root = nltk.data.find(f"{self.subdir}/{zip_name}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError:


Resource wordnet not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet.zip/wordnet/

Searched in:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/andrew/ragflow/rag/svr/task_executor.py", line 29, in from api.db.services.file2document_service import File2DocumentService File "/home/andrew/ragflow/api/db/services/init.py", line 18, in from .user_service import UserService File "/home/andrew/ragflow/api/db/services/user_service.py", line 22, in from api.db.db_models import DB, UserTenant File "/home/andrew/ragflow/api/db/db_models.py", line 33, in from api.settings import DATABASE, stat_logger, SECRET_KEY, DATABASE_TYPE File "/home/andrew/ragflow/api/settings.py", line 36, in from rag.nlp import search File "/home/andrew/ragflow/rag/nlp/init.py", line 21, in from . import rag_tokenizer File "/home/andrew/ragflow/rag/nlp/rag_tokenizer.py", line 26, in from nltk import word_tokenize

Rid7 commented 1 day ago

(rag-flow) andrew@node01:~/ragflow$ sudo bash entrypoint.sh Traceback (most recent call last): File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 84, in __load root = nltk.data.find(f"{self.subdir}/{zip_name}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError:

Resource wordnet not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet.zip/wordnet/

Searched in: - '/root/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/share/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/andrew/ragflow/api/ragflow_server.py", line 26, in from api.apps import app File "/home/andrew/ragflow/api/apps/init.py", line 26, in from api.db.db_models import close_connection File "/home/andrew/ragflow/api/db/db_models.py", line 33, in from api.settings import DATABASE, stat_logger, SECRET_KEY, DATABASE_TYPE File "/home/andrew/ragflow/api/settings.py", line 36, in from rag.nlp import search File "/home/andrew/ragflow/rag/nlp/init.py", line 21, in from . import rag_tokenizer File "/home/andrew/ragflow/rag/nlp/rag_tokenizer.py", line 26, in from nltk import word_tokenize File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/init.py", line 153, in from nltk.translate import * File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/translate/init.py", line 24, in from nltk.translate.meteor_score import meteor_score as meteor File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/translate/meteor_score.py", line 14, in from nltk.stem.api import StemmerI File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/stem/init.py", line 34, in from nltk.stem.wordnet import WordNetLemmatizer File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 13, in class WordNetLemmatizer: File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 48, in WordNetLemmatizer morphy = wn.morphy ^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 120, in getattr self.load() File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 86, in load raise e File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 81, in load root = nltk.data.find(f"{self.subdir}/{self.name}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError:

Resource wordnet not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet

Searched in: - '/root/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/share/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data'

Traceback (most recent call last): File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/corpus/util.py", line 84, in __load root = nltk.data.find(f"{self.subdir}/{zip_name}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/anaconda3/envs/rag-flow/lib/python3.11/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError:

Resource wordnet not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet.zip/wordnet/

Searched in: - '/root/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/share/nltk_data' - '/home/andrew/anaconda3/envs/rag-flow/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/andrew/ragflow/rag/svr/task_executor.py", line 29, in from api.db.services.file2document_service import File2DocumentService File "/home/andrew/ragflow/api/db/services/init.py", line 18, in from .user_service import UserService File "/home/andrew/ragflow/api/db/services/user_service.py", line 22, in from api.db.db_models import DB, UserTenant File "/home/andrew/ragflow/api/db/db_models.py", line 33, in from api.settings import DATABASE, stat_logger, SECRET_KEY, DATABASE_TYPE File "/home/andrew/ragflow/api/settings.py", line 36, in from rag.nlp import search File "/home/andrew/ragflow/rag/nlp/init.py", line 21, in from . import rag_tokenizer File "/home/andrew/ragflow/rag/nlp/rag_tokenizer.py", line 26, in from nltk import word_tokenize

After execute pip install nltk==3.8 in environment of this project, you should avoid the error above.

qinguangxu commented 1 day ago

安装最新的nltk就不报错了