zhujingguang / pdfium

Automatically exported from code.google.com/p/pdfium
0 stars 0 forks source link

Poor JBIG2 performance with large dictionaries #85

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Open attached PDF file (experiment.pdf) in Chrome
2. Scroll through pages

What is the expected output? What do you see instead?
Expected: Smooth scrolling
Actual: Extremely slow response, nearly unusable

What version of the product are you using? On what operating system?
Version 39.0.2171.71 (64-bit) on Ubuntu 14.04

Please provide any additional information below.
Compare to the much better performing control.pdf. The difference is in JBIG2 
settings. For control.pdf, we restart the JBIG2 compression every fifteen pages 
including generating a new dictionary. For experiment.pdf we use a single 
dictionary for the entire 168 page book. I strongly suspect that running a 
profiler will reveal some terrific optimization opportunities in the JBIG2 
decoder. (Perhaps the code is doing something really dumb, like uncompressing 
the same dictionary again and again.)

Original issue reported on code.google.com by jbrei...@google.com on 1 Dec 2014 at 6:15

Attachments:

GoogleCodeExporter commented 9 years ago
I profiled rendering the entire PDF file using a command line tool. The 
critical function is CJBig2_GRDProc::decode_Arith_Template0_opt3. It looks like 
there is a signficant problem.

control.pdf: 2.6 seconds, 242200 calls
experiment.pdf: 15.25 seconds, 1360848 calls

Original comment by jbrei...@google.com on 1 Dec 2014 at 9:47

GoogleCodeExporter commented 9 years ago
Okay, let's look a little closer. The theory is that we are parsing the symbol 
dictionary for every page, despite the fact that many pages share the same 
dictionary. I would expect exactly one dictionary parsing for experiment.pdf,
and roughly 11 for control.pdf. Instead, CJBig2_Context::parseSymbolDict() is 
called roughly 330 times for both cases.

I think the output of CJBig2_Context::parseSymbolDict() needs to be cached.

Original comment by jbrei...@google.com on 1 Dec 2014 at 10:07

GoogleCodeExporter commented 9 years ago
See line 796 of the source code. Caching the most recent output of 
pSymbolDictDecoder->decode_Arith() would make huge difference. Or a little more 
cleanly, cache the most recent (or two most recent) contents of 
pSegment->m_Result.sd.

https://pdfium.googlesource.com/pdfium/+/master/core/src/fxcodec/jbig2/JBig2_Con
text.cpp

Original comment by jbrei...@google.com on 3 Dec 2014 at 10:10

GoogleCodeExporter commented 9 years ago
The attached patch implements a two element LRU cache for the JBIG2 symbol 
dictionary. On my machine it reduces the rendering time of experiment.pdf from 
39 seconds to 14 seconds.

Original comment by jbrei...@google.com on 6 Dec 2014 at 12:01

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks @jbreiden. Can you put the patch on https://codereview.chromium.org for 
review?

Original comment by bo...@foxitsoftware.com on 6 Dec 2014 at 1:18

GoogleCodeExporter commented 9 years ago
Fixed at 
https://pdfium.googlesource.com/pdfium/+/bf42dfea544a8ea3269b139e940f3f8eb38f7a2
7

Original comment by bo...@foxitsoftware.com on 18 Dec 2014 at 3:18