Open mandalrajiv opened 1 month ago
Any update on this issue? We have a customer engagement where we want to show demo of OPEA ChantQnA. Without this issue being fixed, we are not able to show the demo to the customer.
Any update on this issue? We have a customer engagement where we want to show demo of OPEA ChantQnA. Without this issue being fixed, we are not able to show the demo to the customer.
hi, would you like to share the error log here?
Do you need the docker logs? Or some other error logs as well?
Do you need the docker logs? Or some other error logs as well?
The docker logs of the dataprep-redis-server service. I want to see the what errors occurred.
Here are the docker logs of the dataprep-redis-server service. In the bottom of the log fle, it says "Parsing document ./uploaded_files/optimizing-postgresql-on-ec2-using-ebs.pdf.". If the document upload is successful, we typically see something like and "upload successful" message"
Added as an attachment. retriever_log.txt
The last warnings in the log, about things being not any more supported, look quite suspicious.
Btw. @mandalrajiv It's better to provide such (long) log files as attachments (instead of pasting them as inline comments), to keep the ticket readable.
The last warnings in the log, about things being not any more supported, look quite suspicious.
Btw. @mandalrajiv It's better to provide such (long) log files as attachments (instead of pasting them as inline comments), to keep the ticket readable.
Thank you. I have updated the comment to include the log as a file attachment.
There's odd warning about invalid HTTP request, and I'm not sure how to interpret what your log is about, as there seem to be multiple logs, interrupted in middle?
WARNING: Invalid HTTP request received.
Using CPU. Note: This module is much faster with a GPU.
Downloading detection model, please wait. This may take several minutes depending upon your network connection.
files:UploadFile(filename='optimizing-postgresql-on-ec2-using-ebs.pdf', size=628163, headers=Headers({'content-disposition': 'form-data; name="files"; filename="optimizing-postgresql-on-ec2-using-ebs.pdf"', 'content-type': 'application/pdf'}))
link_list:None
Parsing document ./uploaded_files/optimizing-postgresql-on-ec2-using-ebs.pdf.
Progress: |███████████████████████████████ Downloading recognition model, please wait. This may take several minutes depending upon your network connection.
Progress: |████████████root@ip-172-31-26-186:/home/ubuntu/GenAIExamples/ChatQnA/docker/xeon# clear
root@ip-172-31-26-186:/home/ubuntu/GenAIExamples/ChatQnA/docker/xeon# docker logs dataprep-redis-server
PDF doc itself doesn't seem large, just 616KiB / 37 page.
I haven't tried dataprep service myself (I'm not OPEA dev), but is the service terminating abnormally during document upload, or is it stuck on the upload?
What about the recognization model, which one you're using?
On the web UI, the experience is that after I select the file to upload, it does not show me a document upload successful message. Other one page or few pages pdf I have uploaded, shows upload successful message.
I am using all the default mentioned in the OPEA ChatQnA example for Xeon. Please see below.
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" export RERANK_MODEL_ID="BAAI/bge-reranker-base" export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
I guess upload processing time is linearly related to amount of text => 37 page doc could take 20x longer than 2 page one. I.e. if 2 page uploaded goes in 2 mins, that doc could take 40 mins.
How long you've waited? And do other logs show timeouts?
The entrypoint of the dataprep container is a python program called "prepare_doc_redis.py". After I uploaded the document, I did htop utility to see all the resources (CPU, memory) being used by the processes initiated by the python program prepare_doc_redis.py. It takes failr long for the python program instances (prepare_doc_redis.py) to get completed. I have watched the CPU peak during the ingestion and then go down to the minimum once there is no other active process running the program prepare_doc_redis.py. So, I am fairly certain that I have waited the appropriate amount of time.
What other logs do you need? I can dig up the logs if you can tell me what other logs are required?
The entrypoint of the dataprep container is a python program called "prepare_doc_redis.py". After I uploaded the document, I did htop utility to see all the resources (CPU, memory) being used by the processes initiated by the python program prepare_doc_redis.py. It takes failr long for the python program instances (prepare_doc_redis.py) to get completed. I have watched the CPU peak during the ingestion and then go down to the minimum once there is no other active process running the program prepare_doc_redis.py. So, I am fairly certain that I have waited the appropriate amount of time.
What other logs do you need? I can dig up the logs if you can tell me what other logs are required?
hi, I have seen your logs and there is no errors. It seems the process is stuck at the Parsing document
.
I use your data https://docs.aws.amazon.com/pdfs/whitepapers/latest/optimizing-postgresql-on-ec2-using-ebs/optimizing-postgresql-on-ec2-using-ebs.pdf?did=wp_card&trk=wp_card
to upload to redis with dataprep-redis-server (prepare_doc_redis.py
). I timing Parsing document
process, it is indeed a bit slow, but the data can be uploaded successfully. In my side, the Parsing document
process takes ~5mins. The Parsing document
relies on easyocr
to parse pdf file which is time-consuming.
So can you measure the time or print some logs during the upload process by revising the source code https://github.com/opea-project/GenAIComps/blob/main/comps/dataprep/utils.py#L91? Once you revise the code, you need rebuild the docker image.
Thanks~
Document upload and processing needs to be made a background task so that user experience is improved. The UI should indicate that the document has been uploaded and processing is taking place. State of the document processing needs to be reflected in the UI for each uploaded document. Provide an estimate to completion, etc. Currently, the user thinks something went wrong, tries refresh or retries document upload.
I am trying the ChatQnA GenAIExample on docker in Xeon. I am uploading the document https://docs.aws.amazon.com/pdfs/whitepapers/latest/optimizing-postgresql-on-ec2-using-ebs/optimizing-postgresql-on-ec2-using-ebs.pdf?did=wp_card&trk=wp_card. This is a public whitepaper published by AWS. The embedding into vector DB is failing. The docker logs for the dataprep-redis-server service shows Parsing the document. But I do not see a document upload successful message.
Can you please check why this document upload is failing?