ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Apache License 2.0
152 stars 18 forks source link

fix small bugs with docx reader such as non-integer sizes in docx sty… #367

Closed IlyaKozlov closed 11 months ago

IlyaKozlov commented 11 months ago

I use dedoc and am faced with some problems in the wild (error during the docx handling). I could not provide you with the real documents but have created an artificial one with the same problem and added it to test