localdocs: implement .docx support

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

https://nomic.ai/gpt4all

MIT License

69.62k stars 7.62k forks source link

localdocs: implement .docx support #2986

Open cebtenzzre opened 4 days ago

cebtenzzre commented 4 days ago

Using DuckX to parse .docx files similar to the way we parse PDF.

Leaving as draft until we can resolve the fact that we are chunking paragraphs and not pages. The best way to fix that is to stream the document instead of grabbing large discrete chunks of it, but this is blocked on merge of #2969 because that also touches chunkStream.