Not sure how you are doing text extraction, but just saw an article in IEEE computing edge that cited your tool. If you have any interested in Apache Tika we provide a functional Python library that you could leverage. Does pdfminer also do the text extraction part?
The benefit of Tika is that it supports text extraction from 1400+ formats.
Hi,
Not sure how you are doing text extraction, but just saw an article in IEEE computing edge that cited your tool. If you have any interested in Apache Tika we provide a functional Python library that you could leverage. Does
pdfminer
also do the text extraction part?The benefit of Tika is that it supports text extraction from 1400+ formats.
Cheers, Chris