jbdamask / wkid-smaaht

Because every team needs a townie! Enjoy ChatGPT in your Slack workspace
Apache License 2.0
22 stars 0 forks source link

Retain PDF document structure for RAG #41

Open jbdamask opened 11 months ago

jbdamask commented 11 months ago

The standard PDF loader isn't aware of the document structure. This leads to a confusion user experience for queries like, "summarize Section 3".

Check out https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_Structured_RAG.ipynb for ideas

jbdamask commented 11 months ago

Trying PDFminer.