lfoppiano / document-qa

Scientific Document Insight Q/A
https://lfoppiano-document-qa.hf.space/
Apache License 2.0
23 stars 4 forks source link

Could not browse (read) the uploaded pdf file #8

Closed libragirl-dewiyana closed 11 months ago

libragirl-dewiyana commented 11 months ago

Environment: Safari Version 17.0 (19616.1.27.211.1) Frequency: every time Steps to reproduce error:

  1. Input the Chat GPT API-key
  2. Browse file from local source (pdf)
  3. Error happened when executed
TypeError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
Traceback:
File "/home/appuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
    exec(code, module.__dict__)
File "/app/document-qa/streamlit_app.py", line 262, in <module>
    st.session_state['doc_id'] = hash = st.session_state['rqa'][model].create_memory_embeddings(tmp_file.name,
File "/app/document-qa/document_qa/document_qa_engine.py", line 204, in create_memory_embeddings
    texts, metadata, ids = self.get_text_from_document(pdf_path, chunk_size=chunk_size, perc_overlap=perc_overlap)
File "/app/document-qa/document_qa/document_qa_engine.py", line 169, in get_text_from_document
lfoppiano commented 11 months ago

Hi @libragirl-dewiyana, thank you for reporting the problem. It seems an issue when extracting text from the PDF document. Could you share the PDF documents that you are using when you experience this problem?

NOTE: If the documents are public, you can upload them here by replying, then drag and drop them on the text form. If they are not public, could you send them via email (FOPPIANO.Luca@nims.go.jp)?

libragirl-dewiyana commented 11 months ago

Thank you for your reply. I just sent it via email to your address above.

lfoppiano commented 11 months ago

Dear @libragirl-dewiyana, I tested the PDF you sent me but I could not find anything wrong with it.

image

Can you send me a screenshot, next time you have a problem?

In mac you can use command + shitf + 4 and then select the area to delimit the screenshot, then you can just drag and drop here or via email.

Thanks!

libragirl-dewiyana commented 11 months ago

image Thank you for your reply. I also tried the above pdf file (SID-47482.pdf) at the moment, and there is no any problem browsing it. Since I can't reproduce the bug that I experienced yesterday, I'm not able to provide you the error screenshot for that file. However, I can provide you with the screenshots of the previous bugs with different pdf files as following.

First image> October 30th 9:35 am for the file (SID-47456.pdf) image Second image> October 30th 9:38 am for the file (SID-47152-pdf) image For the second image, I also provided the comparison with the previous successful browsing for the same file (SID-47152) on Oct 27th 9:24 am.

All attempts were made on the same environment: Safari Version 17.0 (19616.1.27.211.1) I'll send both files I mentioned above via email. Thank you.

lfoppiano commented 11 months ago

@libragirl-dewiyana thanks for the screenshots. It seems that these errors were due to bugs that should have solved in recent development. In fact I updated the application several time in the last few days.

It seems that now it works well, then I think we can close this issue.

Feel free to open a new one if you have other problems.

libragirl-dewiyana commented 11 months ago
image

Just now (Mon, Nov 6, 15:00 PM) I tried to browsed the previous PDF file (SID-47482.pdf), however the same error message occurred like the previous case. I wonder why the error messages happened randomly, even though the environment is the same? Thank you.

lfoppiano commented 11 months ago

oh! I'll check again the log and let you know.

Meanwhile, could you try this URL from now on? It should be working better: https://lfoppiano-document-qa.hf.space/

libragirl-dewiyana commented 11 months ago

The above URL is working. Thank you very much!

lfoppiano commented 11 months ago

I've opened #11 on what I think is the problem. If you still have troubles, please write directly there.

Terima kasih!