koreader / kindlepdfviewer

(DEPRECATED, please use KOReader instead) A PDF (plus DJVU, ePub, TXT, CHM, FB2, HTML...) viewer made for e-ink framebuffer devices, using muPDF, djvulibre, crengine
GNU General Public License v3.0
498 stars 98 forks source link

Is tile cache really needed for DjVu? #287

Closed tigran123 closed 11 years ago

tigran123 commented 11 years ago

Since djvulibre has its own internal cache I suspect that application-level caching ("tile cache") should not be necessary. Of course I will first make exact measurements to prove or disprove this, before deciding to rip it out of kpdfdjview. But I thought I should ask here first --- maybe someone has already done the measurements and proved one way or another. I am talking about comparing the time spent on rendering the page from tile cache versus the time spent rendering the page directly by djvulibre BUT on the important condition that this same page has been already rendered (more exactly, decoded, not necessarily rendered really) before.

Btw, the same idea should apply to PDF also as mupdf seems to have internal cache as well. But I am focusing on DjVu at the moment because I remember very well that when writing a DjVu viewer for Hanlin V3 all I had to implement is asynchronous page readahead and then the decoding became blindingly fast (from seconds down to milliseconds) and so I did not have to implement any application level caching at all. I will get to PDF later.

tigran123 commented 11 years ago

My very preliminary opinion is that for DjVu files it is probably not needed (but not 100% certain), however it would be too much work to disable it, because all bitblit operations go via cache.bb, so I'll leave it as it is for now, until I know the program's internals enough to have another go at it.

houqp commented 11 years ago

if it also applies for mupdf, then we can just remove the internal cache system so the work can be much cleaner. That will be great :)

dpavlin commented 11 years ago

I like internal caching. It's especially useful when reading pdf files with huge bitmap images since we are pre-rendering next page, so next page flip is instant.

Having said that, rendering indicator (small boxes on top of screen) show that we are re-rendering pages much more often that we optimally should (especially in two column mode), but I still see benefit of caching.

Am I wrong? ;-)

tigran123 commented 11 years ago

@dpavlin No, I am not saying that you are necessarily wrong. I am saying that one must check to make sure that what you are saying is the case, i.e. perhaps the next page flip will still be instant without internal caching, but WITH pre-faulting the next page? By "pre-faulting" I mean (in DjVu case) forcing the page decoding, this loop in djvu.c:openDocument():

while (! ddjvu_document_decoding_done(doc->doc_ref))
        handle(L, doc->context, True);

And there may be similar mupdf code which is either already there or can be added to pdf.c:openDocument().

For DjVu files doing the above is perfectly sufficient to make the next page flip instant. Understand that caching (I mean djvulibre caching, not our internal one) affects only decoding the pages, not rendering them --- rendering time (i.e. time spent in ddjvu_page_render()) is a constant that depends only on rendering mode, not on the djvulibre cache state. Ok, it is true that with internal caching we eliminate this constant, but that is not the point of caching --- the point is to save the (huge, seconds, compared to milliseconds on rendering) time spent on page decoding. So, for DjVu files we can definitely get rid of internal caching.

Now, for mupdf I don't know the answer to this question yet. I will try to find out, so that I don't waste time implementing it for djvu and then having to redo the same work for pdf --- it is better to do it in one go, if at all.

tigran123 commented 11 years ago

Some more findings:

For most PDF files rendering directly (i.e. using mupdf cache) is 2-3 times faster than serving from cache:

# drawOrCache() page 15 rendered in 2.662ms
# drawOrCache() page 15 served from cache in 7.692ms
# drawOrCache() page 16 rendered in 2.763ms
# drawOrCache() page 16 served from cache in 7.689ms
# drawOrCache() page 17 rendered in 2.667ms
# drawOrCache() page 17 served from cache in 7.688ms
# drawOrCache() page 18 rendered in 2.668ms
# drawOrCache() page 18 served from cache in 7.679ms
# drawOrCache() page 19 rendered in 3.005ms
# drawOrCache() page 19 served from cache in 7.71ms
# drawOrCache() page 20 rendered in 3.166ms
# drawOrCache() page 20 served from cache in 7.677ms
...

The exception is PDF files with large images on the page. (the above values are obtained on the emulator, but what is important is not their absolute values but the ratio of "rendering/serving from cache" which ought to be the same regardless of processor speed or architecture).

The worrying thing is that for DjVu we somehow force djvulibre to discard the cache on each page turn. But ddjvu_page_release() should NOT have this effect --- the djvulibre cache is linked to djvu context and is thrown away only upon the call to ddjvu_context_release(). Very strange....

Actually, what would be absolutely FANTASTIC is to be able to make this runtime configurable, i.e. let the user decide --- to use internal cache or not. Then, for every specific document (pictures or not) the user will find the optimal way of rendering pages by trying to turn off internal caching and seeing if it makes things faster.... I think this is the safest approach and also has no risk of throwing away very valuable caching code.

tigran123 commented 11 years ago

Ok, I have some very important good news --- our internal cache is needed. My previous conclusion on DjVu was WRONG because I didn't measure correctly the time spent on decoding the page, which is the crucial element for this whole problem.

Now, the proper times for DjVu look like this:

rendering the page for the first time: 80ms re-rendering the page previously decoded (i.e. serving from djvulibre cache): 12ms serving the page from KPV internal cache: 7ms

So, as you see, our internal cache beats djvulibre's internal cache as 7ms to 12ms!

And as for typeset (no pictures) PDF files, ok, mupdf rendering (with caching) can be 2-3 times faster than our internal cache, BUT such pages are not the ones that require caching anyway. It is the complex pages with pictures etc that benefit from caching and on such pages our internal cache beats mupdf.

So, for BOTH PDF and DjVu files I have proved that our internal cache is beneficial and should be left. I am closing this issue, but everyone is welcome to comment, of course.

houqp commented 11 years ago

Great job @tigran123 !

I still have a question, how did you managed to measure the situation when only MuPDF's cache is used? Using internal cache is just some kind of hash table look up and bitmap copy, it should be fast enough. How can MuPDF's do 2 times faster than it?

houqp commented 11 years ago

OMG! You just reminds me that when I was writing the djvulibre lua module for kpdfv, I learned a lot from libjdvu! And I didn't notice that you are the author until you mentioned the djvu viewer for Hanlin V3. This is incredible... And for me, this one of the reasons why coding for freesoftware is fun :) I have to thank you here for making your djvuviewer source code available!