gptscript-ai / knowledge

Knowledge for GPTScript
https://gptscript-ai.github.io/knowledge/
Apache License 2.0
24 stars 11 forks source link

Fix: use new library to parse docx file and add tests #23

Closed StrongMonkey closed 3 months ago

StrongMonkey commented 3 months ago

The previous library we are using doesn't seem to be able to parse the text from docx. Switching to a new library that works. Also added test for that.

https://github.com/gptscript-ai/knowledge/issues/7

StrongMonkey commented 3 months ago

@iwilltry42 For docx it will fallback to use https://github.com/sajari/docconv/blob/master/docx.go#L25 but I have switched to using convertDocx. For rtf it does require external binary so I've fallback to use the old library. Also addressed your comment about ODT, added seperate parser for Docx, RTF and ODT.