Support for (not parsing) strikethrough

lkaniak commented 7 months ago

Hello,

does this package support a parse option to not get text with ~~strikethrough~~? Is it feasible?

lublak commented 5 months ago

@lkaniak hi :) Theoretically, it would be possible with version 4.0

https://github.com/lublak/pdfdataextract/issues/10

Unfortunately, I'm very limited in my private life, so I can hardly make any progress.

lublak commented 3 months ago

Just for some documentation: pdf self doesn't have information about strikethrough. It use a path data which is than drawn oveFor documentation purposes only: PDF itself has no information about strikethroughs. Path data is used, which is then drawn over the text. In order to recognise whether texts are crossed out or not, coordinates must be used to check this. I don't think pdfdataextract will offer a function for this. But the complete data information, where which text is and where which path is, can be extracted with the future 4.0 version.

lublak / pdfdataextract

Support for (not parsing) strikethrough #11