infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
19.42k stars 1.96k forks source link

[Bug]: Version 0.8 document parsing error #1501

Open charliboy opened 3 months ago

charliboy commented 3 months ago

Is there an existing issue for the same bug?

Branch name

main

Commit ID

013db

Other environment information

OS:ubuntu 22.04
GPU: RTX A5000
CUDA 12.1

Actual behavior

1.Document parsing failed 2.Login with the initial account(admin@ragflow.io:admin) reported a password error, but the newly registered account can log in normally

Expected behavior

No response

Steps to reproduce

cd docker
sudo docker compose -f docker-compose-base.yml up -d
cd ..
sudo ./entrypoint.sh (The entrypoint.sh file has been modified with relevant configurations)
cd web
npm run dev

Additional information

Docker is running normally:

d93cb611d7ca redis:7.2.4 "docker-entrypoint.s…" 17 hours ago Up 17 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp ragflow-redis dc45f60a1afa mysql:5.7.18 "docker-entrypoint.s…" 17 hours ago Up 17 hours (healthy) 0.0.0.0:5455->3306/tcp, :::5455->3306/tcp ragflow-mysql 5358c6058bf1 quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z "/usr/bin/docker-ent…" 17 hours ago Up 17 hours 0.0.0.0:9000-9001->9000-9001/tcp, :::9000-9001->9000-9001/tcp ragflow-minio 44a08c5965e8 docker.elastic.co/elasticsearch/elasticsearch:8.11.3 "/bin/tini -- /usr/l…" 17 hours ago Up 17 hours (healthy) 9300/tcp, 0.0.0.0:1200->9200/tcp, :::1200->9200/tcp ragflow-es-01 ES is working normally: curl -u elastic:infini_rag_flow -XGET http://127.0.0.1:1200 { "name" : "es01", "cluster_name" : "docker-cluster", "cluster_uuid" : "O8R09jtqQAeYddcdhvQ_zA", "version" : { "number" : "8.11.3", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "64cf052f3b56b1fd4449f5454cb88aca7e739d9a", "build_date" : "2023-12-08T11:33:53.634979452Z", "build_snapshot" : false, "lucene_version" : "9.8.0", "minimum_wire_compatibility_version" : "7.17.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "You Know, for Search" }

minio.log: Fail put eeb3b5f440c211efa0af4ed83c63d3f1/GBT 10184-2015电站锅炉性能试验规程 .pdf: S3 operation failed; code: NoSuchKey, message: Object does not exist, resource: /eeb3b5f440c211efa0af4ed83c63d3f1/GBT%2010184-2015%E7%94%B5%E7%AB%99%E9%94 %85%E7%82%89%E6%80%A7%E8%83%BD%E8%AF%95%E9%AA%8C%E8%A7%84%E7%A8%8B%20.pdf, request_id: 17E1A862DC1DA965, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: eeb3b5f440c211efa0af4ed83c63d3f1, object_nam e: GBT 10184-2015电站锅炉性能试验规程 .pdf es.log: ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_683eb2ba403d11ef88154ed83c63d3f1]', ragflow_683eb2ba403d11ef88154ed83c63d3f1, index_or_alias)【Q】:{'match': {'doc_id': 'd3303a9a40c 811ef91cd4ed83c63d3f1'}} ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_683eb2ba403d11ef88154ed83c63d3f1]', ragflow_683eb2ba403d11ef88154ed83c63d3f1, index_or_alias)【Q】:{'match': {'doc_id': 'd3303a9a40c 811ef91cd4ed83c63d3f1'}} ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_683eb2ba403d11ef88154ed83c63d3f1]', ragflow_683eb2ba403d11ef88154ed83c63d3f1, index_or_alias)【Q】:{'match': {'doc_id': 'd3303a9a40c 811ef91cd4ed83c63d3f1'}}

cron_logger.log: An error happened while trying to locate the files on the Hub and we cannot find the appropriate snapshot folder for the specified revision on the local disk. Please check your internet connection and try again.

ERROR.log: Fail put eeb3b5f440c211efa0af4ed83c63d3f1/GBT 10184-2015电站锅炉性能试验规程 .pdf: S3 operation failed; code: NoSuchKey, message: Object does not exist, resource: /eeb3b5f440c211efa0af4ed83c63d3f1/GBT%2010184-2015%E7%94%B5%E7%AB%99%E9%94 %85%E7%82%89%E6%80%A7%E8%83%BD%E8%AF%95%E9%AA%8C%E8%A7%84%E7%A8%8B%20.pdf, request_id: 17E1A862DC1DA965, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: eeb3b5f440c211efa0af4ed83c63d3f1, object_nam e: GBT 10184-2015电站锅炉性能试验规程 .pdf ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_683eb2ba403d11ef88154ed83c63d3f1]', ragflow_683eb2ba403d11ef88154ed83c63d3f1, index_or_alias)【Q】:{'match': {'doc_id': 'd3303a9a40c 811ef91cd4ed83c63d3f1'}} ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_683eb2ba403d11ef88154ed83c63d3f1]', ragflow_683eb2ba403d11ef88154ed83c63d3f1, index_or_alias)【Q】:{'match': {'doc_id': 'd3303a9a40c 811ef91cd4ed83c63d3f1'}} ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_683eb2ba403d11ef88154ed83c63d3f1]', ragflow_683eb2ba403d11ef88154ed83c63d3f1, index_or_alias)【Q】:{'match': {'doc_id': 'd3303a9a40c 811ef91cd4ed83c63d3f1'}} An error happened while trying to locate the files on the Hub and we cannot find the appropriate snapshot folder for the specified revision on the local disk. Please check your internet connection and try again.

KevinHuSh commented 3 months ago

There is no default account such as admin@ragflow.io. I guess minio is not connected. You might need to check conf/service_conf.yaml.

charliboy commented 2 months ago

1.If the database data is completely cleared during startup, the system will initialize a superuser account, and there will be a prompt in the running log: 【INFO】Super user initialized. email: admin@ragflow.io, password: admin. Changing the password after logining is strongly recomanded. This account can be found in the database, but cannot be logged in with the given password. I guess the reason is that the encryption algorithm used during initialization and the verification algorithm used during login are not consistent The account registered through the login page is not a superuser account by default, and I don't know what the difference is between them.

2.The logs of this project are not very user-friendly. After tracking and researching, I found that this error was caused by not adding "HF-INDPOINT= https://hf-mirror.com" in the environment variable.

@KevinHuSh Thank you for your enthusiastic reply

sathya-ml commented 2 months ago

Hello,

I'm having the exact same issue. @charliboy could you please provide more information on what exactly solved the problem? Where exactly did you add "HF-ENDPOINT", I'm finding it in multiple files?

charliboy commented 2 months ago

Just add a line at the beginning of the entrypoint.sh file like this: _export HFENDPOINT=https://hf-mirror.com If you are located in Chinese Mainland, you need do it, but not in other places. In addition, you also need to install nginx, and when nginx starts, you need to load the configuration file located in the Docker/nginx directory @sathya-ml