simon987 / sist2

Lightning-fast file system indexer and search tool
GNU General Public License v3.0
843 stars 55 forks source link

Specify page in PDF for thumbnail extraction #375

Open robertpfau opened 1 year ago

robertpfau commented 1 year ago

Which SIST2 component is your Feature Request related to? Scan

What would you like to see happen? Ability to specify the PDF page from which the thumbnail gets generated

Additional context One of my indexes has thousands of reports which include a blank page 1 - thumbnails all end up empty - I could loop through them and delete them but I'd prefer not to touch the originals as that would cause a lot of headaches with premissions

simon987 commented 1 year ago

Hi, I would prefer not to add additional parameters unless necessary. Is the first page 100% blank?

There is already code in place to skip the first page if it's blank. Maybe yours are not 100% empty? (i.e. if I were to render the page as a .jpeg every single pixel would have the same color). I can adjust the code so it skips the first page if the cover is ~99% empty instead, do you have an example PDF I can look at? (you can email it privately at me@simon987.net)

robertpfau commented 1 year ago

you are right, it isn't fully empty but empty enough to not be useful. can understand that you don't want to overload it with options. no problem. I will just edit the script in my install. could you point me in the right direction at what point in the script it would be easiest to control this?

simon987 commented 1 year ago

Hi, the easiest way would be to change the is_blank_pixmap() function behaviour here:

https://github.com/simon987/sist2/blob/master/third-party/libscan/libscan/ebook/ebook.c#L113-L121

or just change 0 to 1 for the second parameter of load_pixmap() here:

https://github.com/simon987/sist2/blob/master/third-party/libscan/libscan/ebook/ebook.c#L108C39-L108C39

But again, as I said I would be happy to integrate a smarter "should this cover page be skipped" algorithm in the upstream repository if you can send me a sample file