Using DuckX to parse .docx files similar to the way we parse PDF.
Leaving as draft until we can resolve the fact that we are chunking paragraphs and not pages. The best way to fix that is to stream the document instead of grabbing large discrete chunks of it, but this is blocked on merge of #2969 because that also touches chunkStream.
Using DuckX to parse .docx files similar to the way we parse PDF.
Leaving as draft until we can resolve the fact that we are chunking paragraphs and not pages. The best way to fix that is to stream the document instead of grabbing large discrete chunks of it, but this is blocked on merge of #2969 because that also touches chunkStream.