Ask OpenAI to summarize the contents of that table as text, and ingest that text into the corpus, in addition to the other content we already ingest.
With this PR this is an option that can be turned on, and if so required an addition of OpenAI key
Note that this process is relatively slow (at least on my Mac M2) and so may or may not be useful at very large scale.
This PR adds an optional step for processing tables in PDF documents before ingestion. This is a relatively common approach in LangChain (e.g. https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_Structured_RAG.ipynb) or LlamaIndex:
With this PR this is an option that can be turned on, and if so required an addition of OpenAI key Note that this process is relatively slow (at least on my Mac M2) and so may or may not be useful at very large scale.