Closed rasmuseeg closed 8 years ago
Sure send the PDF through. The PDF reading comes from this code:
https://github.com/umbraco/UmbracoExamine.PDF/blob/master/src/UmbracoExamine.PDF/PDFParser.cs
You could step through this to see if that is just how iTextSharp is reading the PDF or if it is part of the char replacing occurring. I won't have time to debug this for a couple of weeks.
I've done some testing my self. And \n
is in the HashSet<char> toExclude
.
Could exclude those in the replaceWithSpace?
As you can see much of the text from the pdf is missing spaces at many places. I can send you the pdf file.