harshankur / officeParser

A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx and odt, odp, ods..
MIT License
123 stars 17 forks source link

Ver4.0 #19

Closed harshankur closed 10 months ago

harshankur commented 10 months ago

Revamped content parsing code. Fixed order of content in files, especially in word files where table information would always land up at the end of the text (#17). Added config object as argument for parseOffice which can be used to set new line delimiter (#10) and multiple other configurations. Added support for parsing pdf files (#18) using the popular npm library pdf-parse. Removed support for individual file parsing functions.