Rendering of images created from PDFs containing vectorized text

mlampert84 commented 3 weeks ago

Hello! At the Berlin Brandenburg Academy of Sciences, we are using Digilib to render Pdfs that contain vectorized text. We have the problem that in the JPG image (for example: https://cmg.bbaw.de/epubl/online/cmg_05_06_01_01.php?p=300), the letters look hazier, i.e. more pixelated, than in the Pdf (https://cmg.bbaw.de/epubl/online/PDF/cmg_05_06_01_01/cmg_05_06_01_01_0300.pdf). The difference is subtle but noticeable. We are currently using TIFFs, converted from the PDF, as the image file that Digilib processes.

I have tried using a higher resolution TIFF for Digilib to ingest, as well as using a high resoution PNG or JPG for Digilib to ingest. In all cases, the letters in the image are not as sharp as in the Pdf.

I am aware that this is not primarily a challenge with Digilib, but rather a general problem of rendering text in an image. My attempts with ImageMagick to generate an image with sharp text from the Pdf have also failed. For example, I can generate a high-resolution PNG from the Pdf that looks very sharp when you zoom in. However, when looking at the PNG in a normal size, the text again looses its sharpness (this is apparently the effect of anti-aliasing).

Perhaps you have some insight into how we can use Digilib to improve the quality of the text in our images?

robcast commented 2 weeks ago

As you noted, the text quality that is shown on a user's screen (vector-PDF or images) depends on the rendering and anti-aliasing algorithms and settings of the source image, the browser or PDF viewer rendering the image or the PDF and the operating system. The influence of digilib in this whole process is limited.

I looked at the images in digilib at the link you gave and the PDF (in the browser and in macOS-Preview) and they look very similar on my Mac.

Your source images are very high resolution and basically black and white. You could try to use lower resolution grayscale images where you control the scaling with lighter or heavier gray profiles but I am not sure how much different the result will be.

robcast commented 1 week ago

I just had the idea that the display of text on hi-density displays (where the physical pixel number is higher than the logical pixel number, e.g. Apple's Retina, or HiDPI, displays) could be improved using a higher-resolution image and the img srcset attribute (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#srcset).

That would require some changes in the viewer HTML and Javascript. To detect a high-density display we could use https://developer.mozilla.org/en-US/docs/Web/API/Window/devicePixelRatio

mlampert84 commented 1 week ago

I've started to mess around with the grayscale settings via imagemagick to see if I get better results. I'll let you know what conclusions I come to. I'll also look into the issue of hi-density displays, though I am skeptical, since the problem seems to also crop up on low density displays. So, more to follow...and thanks for the suggestions.

robcast commented 1 week ago

The idea with the hi-density displays won't help low density displays. It would just give the chance to look clearer on hi-density and maybe be closer to good PDF display quality on those displays.

robcast commented 1 day ago

The digilib frontend in the latest release 2.12.4 automatically uses a higer-resolution image on hi-density displays :-)

robcast / digilib

Rendering of images created from PDFs containing vectorized text #65