mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.95k stars 3.02k forks source link

Adding the page number in the sources #377

Closed samichaignonmejai closed 1 year ago

samichaignonmejai commented 1 year ago

Hey, I found out that in the output of doc.metadata was : {"source":"/Users/../doc.pdf","pdf_numpages":1945,"loc":{"lines":{"from":74528,"to":74551}}}> I wonder if it is possible to extract directly the page number instead of the lines refering to a particular chunck of test. I tried my best but I can't understand where is the "loc" being built in the code and how to modify it. If anyone has an idea on how to manage it, let me know !

dosubot[bot] commented 1 year ago

Hi, @samichaignonmejai! I'm Dosu, and I'm here to help the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are having trouble finding where the "loc" is being built in the code and how to modify it in order to add page numbers to the sources output of doc.metadata. Unfortunately, there hasn't been any activity or comments on this issue yet, and it remains unresolved.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the project!