mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.82k stars 10.04k forks source link

Implement tiling of canvas into smaller pieces #6419

Open yurydelendik opened 9 years ago

yurydelendik commented 9 years ago

The problem is that large canvases take much memory space. It's visible if a PDF page is large (e.g. map) or zoomed in (e.g. at 800%+ zoom). Currently we are limiting canvas size (#4834) for mobile device. However a proper solution will be to divide page into smaller canvases and render only visible parts.

It's mostly blocked by generating operator list based on crop area (useful for zooming heavy maps), but we can proceed without it, and try to render the same operator list on several canvases.

yurydelendik commented 9 years ago

(We might also consider to tile JPEG images when we read operator list)

ManasJayanth commented 8 years ago

@yurydelendik Can you help me get started on this?

yurydelendik commented 8 years ago

@prometheansacrifice first thing will be to modify API (and canvas.js) at https://github.com/mozilla/pdf.js/blob/master/src/display/api.js#L834 to have e.g. targets property which will be alternative to canvasContext. It will be an array of objects with canvas and location properties, which will define location of the canvas on the grid of canvases (Notice that now we will use CANVAS vs its 2d context to know its size; we might add canvas property along with canvasContext). canvas.js shall render now on multiple contexts at the same time.

yurydelendik commented 8 years ago

TODOs:

Liestambeur commented 7 years ago

Is there any progress around this issue or is it not on the future planning anymore?

yurydelendik commented 7 years ago

@Liestambeur it's still in plans but not on a schedule -- other projects is in progress atm. A contributors who wish to help to advance it, may find us at IRC channel.

d01010101 commented 4 years ago

I would just like to add that with today's wide desktop displays, I find interactive page-less PDFs a great way for a high-quality 1:1 web presentation of e.g. Latex documents, where a plain zoom works well enough and text flowing is only breaking typography rules or a careful placement of floats. Of course, it won't replace more interactivity, animations or text flow when it is actually needed, but may have its niche.

Yet, when I generated a PDF of 1x A4 width and 25 x A4 height and rendered it with PDFSinglePageViewer, probably one of these limits here discussed have been hit, despite the rendering having only about 100 dpi. The canvas size was about 800 x 20000 and produced a blocky, unusable rendering.

I would say that some kind of tiling might extend pdf.js applications beyond these of a viewer of printable PDFs. If anyone is interested, I attach a test PDF of a similar aspect ratio to that discussed.

bobsingor commented 3 years ago

I would like to offer a bounty of 2000 USD for this feature

This feature is becoming more important to our company. I see that there were 16 issues raised that are related to this one. I was wondering if this deserves more attention?

Hoping that this bonus will help get this resolved for everyone interested.

mmouterde commented 3 years ago

PDF specifications include the concept of CrobBox. (TL;TR a way to crop the content to render)

I may missed something but : As pdfJs is compliant with this cropbox, I guess setting the cropBox to the tile dimension when rendering could do the job ?

According to the pdfJs code, the cropbox parsed from the pdf file ends in the array : page.view See demo https://jsfiddle.net/mmouterde/s72wpgvL/

That said, the performance is not better on tile render than on full page render, but it could be used for tile generation or to render pdf on smaller canvas ?

HinTak commented 3 years ago

@mmouterde you are quite mistaken about the nature of cropbox . When a document is printed, there are registration marks and stapling areas outside the "interesting area" which is for printed media - as I said, registration marks (to make sure layers of colours lined up), and extra magin for stapling. Those are of no interest to screen viewing and hence screen viewer should ignore things outside cropbox. It is not an arbitrary idea for cropping; it is a fairly well-defined set of numbers sometimes written inside certain pdfs, when they are intended for actual paper publication.

It is called cropbox for what it is, extra paper to be cut away at some point when you bind or staple the papers together.

jjaychen1e commented 1 year ago

Any plan about this issue? This limitation makes pdf.js unavailable in many scenarios, for example, zooming a pdf file, or opening a PDF exported by Safari. Also it affects lots of applications depend on pdf.js, such as Logseq. I think this issue should get a higher priority :)

alexcat3 commented 1 year ago

I would really appreciate a solution to this as I enjoy looking at track maps of railway and subway systems. These are generally large images intended to be viewed at high zoom levels. For example, see the track map of the NY Subway below, which is https://www.vanshnookenraggen.com/_index/docs/NYC_full_trackmap.pdf

AkiSakurai commented 1 year ago

I would really appreciate a solution to this as I enjoy looking at track maps of railway and subway systems. These are generally large images intended to be viewed at high zoom levels. For example, see the track map of the NY Subway below, which is vanshnookenraggen.com/_index/docs/NYC_full_trackmap.pdf

A naive solution would be to simply set the transform of the render function to render the page to a smaller canvas. However, the performance is quite bad, as you can imagine. The render time increases linearly with the number of tiles.

Demo

https://github.com/mozilla/pdf.js/blob/e67bf68293746122cd579f3134b9b5f16762aed3/web/tile_canvas.js#L42-L55

marco-c commented 1 year ago

Even if performance is not ideal, it seems still better than the current status. WDYT @calixteman @Snuffleupagus @timvandermeij?

Snuffleupagus commented 1 year ago

Even if performance is not ideal, it seems still better than the current status.

Not really, since as already mentioned in https://github.com/mozilla/pdf.js/issues/6419#issuecomment-1837530552 this will affect performance quite badly in many cases: "The render time increases linearly with the number of tiles."

It might not look so bad in the demo above, but that's probably because that particular PDF document isn't all that "complex". Please consider the case where a page (currently) takes 2 seconds to render: If that's split into 10 sub-canvases, that same page now takes 20 seconds to finish rendering! The explanation is that while each individual canvas indeed becomes smaller, that doesn't really help performance-wise since the OperatorList is still the same and will be parsed (and rendered) in its entirety for each sub-canvas.

In order for this to work we'd need a way for the src/display/canvas.js code to skip rendering instructions that are outside of the current sub-canvas, while still handling general graphic-state changes correctly. (This could perhaps be done a little similar to how disabled OptionalContent is skipped in src/display/canvas.js.)

marco-c commented 1 year ago

Other than potentially wasted CPU time, what is the downside if we use the current CSS zoom solution and replace it when the rendering is done? Isn't it still a net improvement?

marco-c commented 1 year ago

Here's the PDF for future reference NYC_full_trackmap.pdf.

AkiSakurai commented 3 days ago

One thing to note is that the time taken to draw outside the canvas is significantly lower than drawing inside the canvas. This measurement is based on drawing 1,000,000 Bézier curves. Therefore, re-issuing the drawing command for every tile might not be as bad as it seems.

Here are some benchmark results:

Browser Time Inside Canvas (ms) Time Outside Canvas (ms)
Chrome 3800 203
Safari 53033 887
Firefox 17266 778
nicolo-ribaudo commented 2 days ago

I noticed the same in https://github.com/mozilla/pdf.js/pull/19128, where rendering the tile is much faster than rendering the whole. For a partial render, the JavaScript code significantly dominates the time spent drawing.