Closed pranasziaukas closed 4 years ago
Yes, I have a plugin system in a development branch, but it didn't make the cut for the 9.0 release. I'd rather get it right even if it takes longer, so I don't have to break compatibility with plugins.
For now you might as well fork and make whatever changes seem appropriate. You could override the current behavior of --clean
for example. Although I don't understand exactly what you're trying to do - cleanup output text?
I'm trying to enhance the readability of the text in an image (I was thinking about median or bilateral filtering to start with).
So I guess forking is the way to go for now. Do you have even the slightest idea when the plugin system might be released?
I think forking will be best for your case.
I'll close the issue now. If you have further related questions feel free to reopen it.
First and foremost, my hat's off to you for what appears to be an excellent PDF OCR'ing solution.
My use case is that there is a bunch of old documents, some of which have a low quality "spongy" text. Currently,
--clean
removes the background noise nicely but there is (AFAIK) nothing similar for text itself in OCRmyPDF.I noticed a "plugin system" being mentioned elsewhere but I'm not sure what it is.
How would you recommend to approach this problem? I'm fine with writing Python scripts.