Closed SB2020-eye closed 3 years ago
Hi SB2020-eye, indeed I didn't provide much details about the quality of extracted elements, but I can understand lossless extractions are key when working with HD documents:
save
function arguments (line 96-97 in extractor.py
), you can start by setting quality=95
and look at the results (see here for more information), ii) save images in a lossless format like png
, you can do that by passing out_ext='png'
when instanciating the Extractor
(line 212, extractor.py
)Be aware it will take much more space than compressed jpg images, I hope this helps!
This is all very helpful, indeed! Thank you so much!
Hello again.
The images that result from running docExtractor (as found in the "text" and "illustration" folders) -- should I expect these to be full resolution with respect to the original? Or is some reduction or loss involved?
(I did my own little (if crude) test. Here is a clip resulting from docExtractor:
The file size is 33.7 KB.
Here is a crop I made from the original:
The file size is 70.4 KB.
I confess I don't know enough about digital imagery, how files are saved, etc to know if this concludes anything or not. :) Regardless, my goal is to have crops made using docExtractor that are lossless.)