Scanned pdf not as sharp in new build

dracodoc commented 12 years ago

I just tested build kindlepdfviewer-v2012.10-50-gd2e176f. Compare to kindlepdfviewer-v2012.09-389-g1f11513, the new version rendered the scanned pdf much more softly, i.e. not as sharp as before.

I checked the global refresh state, it is global refresh be default. I also checked gamma value, it doesn't look like related to gamma.

I took screenshots on 2 pdf file on both version, you can see them here https://picasaweb.google.com/dracodoc/PdfRendering?authuser=0&authkey=Gv1sRgCPy6qpm2vtG1igE&feat=directlink

You may want to zoom in to 100% then compare them. The new version rendering is softer.

tigran123 commented 12 years ago

Hmmm, is this by reflow or the standard reader? Please always specify which reader you are using.

dracodoc commented 12 years ago

It's by standard reader, I didn't mention it because I think this is the "default" one. I never use reflow on scanned pdf since I already processed them with pdflrf.

hwhw commented 12 years ago

I guess we have side effects from that MuPDF update. Will need a few days to investigate. And, errm, then comes the hard part of deciding if and what to change...

tigran123 commented 12 years ago

Yes, sigh, that is indeed the hardest part --- deciding if it is a degradation or not, because a degradation in a less important feature (supporting scanned pdfs when all such should be in djvu format) could be simultaneous with some good feature or bugfix that we got from the MuPDF update...

dracodoc commented 12 years ago

Sorry I don't agree this "a less important feature (supporting scanned pdfs when all such should be in djvu format)".

Reading scanned pdf is the single biggest reason I bought my Kindle DX.

There are tons of scanned pdf and they are not in djvu format from the beginning. I knew djvu have advantages but I don't think people will convert their scanned pdf to djvu.

I think the difference probably come from some default parameter changes from MuPDF. I can help to test different combinations or test build to separate which is the cause.

tigran123 commented 12 years ago

@dracodoc yes, I agree that we should certainly find out what is causing this.

dracodoc commented 12 years ago

@tigran123 , I just tested all recent builds, your second recent version kindlepdfviewer-v2012.10-6-gbef4065.zip is rendering scanned pdf all right just like before. Though it seemed this version haven't incorporate the mupdf update yet.

I looked at here https://github.com/hwhw/kindlepdfviewer/commit/a04823528b7737a9742e1cb137d777c88cadee8f

the "dirty patch in MuPDF's thirdparty liby for CREngine" is removed, is it related?

I'm just taking wild guess here, probably they are not relevant.

hwhw commented 12 years ago

No, we need to see what changed inside MuPDF's code paths. It probably changed anti-aliasing behaviour for some graphics. These are image-based scans, right? I guess they have a resolution higher than what gets displayed? Will need to replicate this...

dracodoc commented 12 years ago

These pdf are image based. I know pdf with high resolution can have subpar rendering in Kindle DX, so I always use pdflrf to process them, and will resample pdf to the exact resolution of Kindle DX which is 1200x824.

I extracted the page that I took screenshot to a one page pdf here, you can use it as a test pdf. It's in Chinese but should still serve the purpose.

https://www.box.com/s/82t0tk3d6kuv0b2qq141

tigran123 commented 12 years ago

Thank you for the test pdf file. We'll try to figure out what is going on...

tigran123 commented 12 years ago

Ok, I have tested and made the screenshots and uploaded here:

Chinese text: http://www.klib.8tar.com/screenshots/page183-pdfold.bmp http://www.klib.8tar.com/screenshots/page183-pdfnew.bmp

English text: http://www.klib.8tar.com/screenshots/foreword-pdfold.bmp http://www.klib.8tar.com/screenshots/foreword-pdfnew.bmp

Now I don't think that there is a degradation here. In fact, looking at the rounded glyphs (e.g. brackets) you can see that the old version renders them in a rather pixelated way, but the new one render them a bit more smoothly. So, I think there is nothing to fix or downgrade here --- MuPDF performs as expected and the new version has slightly better rendering than the old one.

kai771 commented 12 years ago

To me, it just looks that the new rendering is one pixel wider, and that the extra smoothness is a consequence of this. Of course, it's just my impression, I could be wrong.

hwhw commented 12 years ago

In fact they fixed something they declared as a bug where alpha handling did not cascade well. But I'll need to investigate carefully, won't get to do it before next weekend, I think...

dracodoc commented 12 years ago

I agree with @kai771 , @tigran123 's new version is just a little bit wider.

https://picasaweb.google.com/lh/photo/C9PV9qRz9eMlAeAtjOGMqzePsG--uXS8LmREKCWlvGk?feat=directlink

https://lh4.googleusercontent.com/-8cl6AngYnFw/UJkfYpTo_kI/AAAAAAAAB60/8rE_UEfJgfI/s800/comparsion.gif

You can download the original gif from the first link to view full screen, the difference is substantial.

I just found out why @tigran123 had different results:

I knew high resolution pdf can have subpar rendering effect on kindle(probably because kindle software's resampling algorithm is not as good as pc software), so I've been process scanned pdf to the exact screen resolution for a long time. The sample pdf I provided have 824x1200 resolution, while @tigran123 is using 600x800 reader, so his old version don't have the same effect of my old version, thus the difference between new/old is trivial in his case.

tigran123 commented 12 years ago

@dracodoc my "old" version corresponds to MuPDF prior to upgrade and the "new" version is after the upgrade. There are no other variables in my screenshots. The bmp files I provided are raw i.e. correspond exactly to the pixels you see on Kindle's eInk screen. In particular, this implies that there is NO WAY for "my new version to be like your old version", this is simply impossible, unless you have converted the BMP files to some other format (GIF?) and introduced distortions in the process. You speak of "original gif" --- but there can be no "original gif", because KPV application does not support screenshots in gif format, only in bmp. So, please only refer to the screenshots I provided as they are "native" to Kindle and not to the results of conversion from bmp to gif (or some other unspecified processes you have used to generate those images).

dracodoc commented 12 years ago

@tigran123 I just updated my comments and I think I knew why you have different result with mine. I'm making a test pdf for 600x800 resolution, you should be able to see the difference in this test file.

The original gif I mentioned is to suggest downloading the gif file from picasa album instead of view it online, because the default view on picasa is not 100%, and the zoomed in view don't have the animation effect.

tigran123 commented 12 years ago

Ah, perhaps yours are screenshots from Kindle DXG? Ok, then they may well differ of course.

dracodoc commented 12 years ago

I made a 600x800 pdf following same method:

https://www.box.com/s/g79x9f3hn7splm4ywo5z

I don't have a 6" kindle so I can't test the result. This is the same book, so the 600x800 version will have smaller font size, and the difference could be not as substantial as the 824x1200 one.

So I made another one, this book have bigger font:

https://www.box.com/s/usj54asfn5v5ko6hj868

dracodoc commented 12 years ago

I generated some pdf with higher resolution to test. The good news is, with resolution higher than screen, the new mupdf can render it right, almost same like before. So it seemed the new mupdf has problem with exact screen resolution image, but can process higher resolution image correctly.

The bad news is, I have to generate all my pdf again if I have to use this method to solve the problem, and it's difficult to test the optimal resolution, the opening and page turn speed will also be slower.

tigran123 commented 12 years ago

Well, the problem was that you had the assumption that generating a scanned PDF with images matching the screen's resolution exactly is somehow "optimal". It is not. I always view images with resolutions like 4000x6000 on Kindle's 600x800 screen and this is never a problem (i.e. no performance impact whatsoever). Actually, the real problem would happen if I try to down-sample the images to match Kindle's resolution (or even slightly higher like 2000x3000).

A few days ago I actually tried to do this and, believe me, I tried maybe seven or eight different filtering algorithms (Lanzosh (sic), Gauss, Langrange and a few others, can't remember the names) and they ALL were faulty, i.e. I concluded that there is NO KNOWN ALGORITHM on this planet that can down-sample the images containing black-and-white text from one resolution to another (lower one) without serious degradation of quality. I had no choice but to give up on the idea of down-sampling. I suggest the same to you.

However, I managed to reduce the size of my DjVu (and PDF) files by a factor of 5.8 (from 200MB to 34MB for DjVu and 41MB for PDF) by allowing cjb2 to clean the flyspecks and do some other compression techniques --- the result was miraculously small and with absolutely no noticeable degradation of performance.

So, if you have some awfully large scanned books then the optimal way to deal with them seems to be this:

Convert to DjVu with cjb2 -losslevel 100.
Convert from DjVu to PDF using DjVuToy.

This assumes that you prefer PDF format.

tigran123 commented 12 years ago

Actually, to tell the truth, when I tried to view 1600dpi scanned books (i.e. 8206x14323 pixels resolution) on Kindle 3 I noticed very slight (about 10% maybe) performance degradation compared to viewing the same book at 600dpi (3677x6418 pixels) resolution. But 1600dpi resolutions books are very rare (only 3 in my whole collection! :) so as long as you stick to 600dpi or below you will never notice any performance difference compared to your "optimal" ones matching Kindle's screen resolution exactly.

tigran123 commented 12 years ago

interestingly, I say that there is no good down-sampling algorithm (as I tried the myriad of those available in ImageMagick and faulted all of them) but somehow djvulibre (and mupdf) must manage to downsample the raw image to the pixmap that gets blitted to eInk's screen. Well, I don't know, I treat djvulibre (and now mupdf, seeing its superb rendering) with some trepidation as "black box" i.e. they are able to do things that command line convert tool in ImageMagick does not seem to be able to...

dracodoc commented 12 years ago

That is not an assumption, that is tried and true experience on kindle original software.

No matter the original resolution, if you resample the pdf to kindle screen resolution on pc, this is all the details you can get in that resolution. However, kindle original software will show high resolution pdf with a much worse effect than pc software resampling. If you resample the pdf in pc first, kindle can render it better than put the pdf in kindle directly. This is the difference I'm talking about. Another proof is, pictures can have better result in kindle than pdf made from the same pictures.

There have been a lot of discussion and experiments done years before in a Chinese forum, but I can't find the link right now, and I don't have time to take pictures to prove it again.

This is another thread talking about this, I knew at least 2 developers can read Chinese so this may have some use to them: http://www.hi-pda.com/forum/viewthread.php?tid=898291&extra=page%3D1&page=2

It came from my previous experience, and probably are not needed any more for KPV.

That being said, since I want to use pdflrf anyway(it can do a lot of things, mainly to stitch pages together to have a simulated scroll mode which is lacked in KPV), so I need to assign a resolution, then use the exact resolution have good result and can keep the file smaller.

This method have not meet problem until the new mupdf update. I should be able to work around the problem by generate pdf again if the difference in mupdf is difficult to find.

kai771 commented 12 years ago

@dracodoc This probably won't help anything, but I'm curious. If you have time, can you please try to resample the pdf to 1199x824 pixels and compare it to the 1200x824 of the old rendering?

dracodoc commented 12 years ago

You think the new version have 1 pixel less usable screen? I used komic (in hi-pda thread mentioned above) to generate some test pdf. They are designed to test the actual pdf display resolution. If the screen resolution matched pdf resolution, the pdf shoule be even gray, otherwise there will be artifacts.

I generated test pdf with following resolutions: https://www.box.com/s/cmajk8pvoiww1jtj9u44 824x1200, 823x1200, 825x1200 824x1199,825x1201

The results:

in kindle original software, the available resolution for pdf obviously is not 824x1200, so 824x1200 pdf showed big bands. This is expected.
in KPV, the old version and new version have same test result, so the available resolution didn't change in new version.

824x1200 have no bands, there are even distributed dots,lines in pages, I believe it is right. Changes in height don't have visible changes too. KPV is very sensitive to changes in width. One pixel change, i.e. 823x1200 and 825x1200 pdf have big bands in KPV. It could also relate to the fit of page. I'm using default fit page in every case.

You can also generate 600x800 like test file with komic. You may need to choose some pdf as source although the test pdf is not based on source pdf, otherwise komic will quit in my pc.

kai771 commented 12 years ago

Actually, I thought that new version of mupdf somehow doesn't mach your 824x1200 resolution pdf pixel for pixel, but that it upscales width a little bit. I made a mistake in my previous post, asking for 824x1199 - you rightly guessed that I wanted you do test decrease of width. As expected, it didn't help at all, but at least my curiosity is satisfied :).

tigran123 commented 11 years ago

I have been reading a lot of scanned PDF files in the past week or so and I am very pleased with the way the latest mupdf handles them --- slightly better than before and MUCH better than the equivalent (same images, same resolution) DjVu files are handled by djvulibre. So much so that when the size increased by DjVuToy is within 10-15% I prefer to use the PDF file. The only problem is that PDF files are much slower than DjVu (especially TOC retrieve function --- about 4 times slower than the same TOC retrieved from the equivalent DjVu).

I mentioned this in the Wiki ChangeLog (also warned users against downsampling scanned files as it is never a good idea and will always degrade quality. There is no good downsampling algorithm that has no quality degradation, unfortunately.)

Anyway, closing the issue as there is nothing to solve here.

koreader / kindlepdfviewer

Scanned pdf not as sharp in new build #550