nissl-lab / toxy

.net text extraction framework
Apache License 2.0
358 stars 107 forks source link

Recommendation For Pdf Files #25

Open pcinfogmach opened 4 months ago

pcinfogmach commented 4 months ago

I would like to recommend using XpdfNet for extracting text from PDF files. It offers increased accuracy for various languages and is also much faster.

attached code to use

 string XpdfNetTextExtract(string filePath)
 {
     return new XpdfNet.XpdfHelper().ToText(filePath);
 }
tonyqus commented 4 months ago

XpdfNet looks to be a good PDF library.