yobix-ai / extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Apache License 2.0
448 stars 17 forks source link

TypeError: ParseError("Parse error occurred : TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@281b1a01") #16

Closed NourEldin-Osama closed 1 month ago

NourEldin-Osama commented 1 month ago

When running the following code:

from extractous import Extractor

extractor = Extractor()

# Extract text from a file
result = extractor.extract_file_to_string("Docs\Bug_Demonstration.docx")
print("Result:", result)

I receive the following error:

main.py:7: SyntaxWarning: invalid escape sequence '\B'
  result = extractor.extract_file_to_string("Docs\Bug_Demonstration.docx")
TypeError: ParseError("Parse error occurred : TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@281b1a01")

The file that caused the error Bug_Demonstration.docx

I expect the code to run normally and extract the content of the file

extractous==0.1.5

OS: Windows 11