monniert / docExtractor

(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper
https://www.tmonnier.com/docExtractor
MIT License
85 stars 10 forks source link

quality/resolution of image results #2

Closed SB2020-eye closed 3 years ago

SB2020-eye commented 3 years ago

Hello again.

The images that result from running docExtractor (as found in the "text" and "illustration" folders) -- should I expect these to be full resolution with respect to the original? Or is some reduction or loss involved?

(I did my own little (if crude) test. Here is a clip resulting from docExtractor:

Folio_015v_22

The file size is 33.7 KB.

Here is a crop I made from the original:

Folio_015_linetest

The file size is 70.4 KB.

I confess I don't know enough about digital imagery, how files are saved, etc to know if this concludes anything or not. :) Regardless, my goal is to have crops made using docExtractor that are lossless.)

monniert commented 3 years ago

Hi SB2020-eye, indeed I didn't provide much details about the quality of extracted elements, but I can understand lossless extractions are key when working with HD documents:

Be aware it will take much more space than compressed jpg images, I hope this helps!

SB2020-eye commented 3 years ago

This is all very helpful, indeed! Thank you so much!