Open yurydelendik opened 9 years ago
(We might also consider to tile JPEG images when we read operator list)
@yurydelendik Can you help me get started on this?
@prometheansacrifice first thing will be to modify API (and canvas.js) at https://github.com/mozilla/pdf.js/blob/master/src/display/api.js#L834 to have e.g. targets
property which will be alternative to canvasContext
. It will be an array of objects with canvas
and location
properties, which will define location of the canvas on the grid of canvases (Notice that now we will use CANVAS vs its 2d context to know its size; we might add canvas
property along with canvasContext
). canvas.js shall render now on multiple contexts at the same time.
TODOs:
Is there any progress around this issue or is it not on the future planning anymore?
@Liestambeur it's still in plans but not on a schedule -- other projects is in progress atm. A contributors who wish to help to advance it, may find us at IRC channel.
I would just like to add that with today's wide desktop displays, I find interactive page-less PDFs a great way for a high-quality 1:1 web presentation of e.g. Latex documents, where a plain zoom works well enough and text flowing is only breaking typography rules or a careful placement of floats. Of course, it won't replace more interactivity, animations or text flow when it is actually needed, but may have its niche.
Yet, when I generated a PDF of 1x A4 width and 25 x A4 height and rendered it with PDFSinglePageViewer, probably one of these limits here discussed have been hit, despite the rendering having only about 100 dpi. The canvas size was about 800 x 20000 and produced a blocky, unusable rendering.
I would say that some kind of tiling might extend pdf.js applications beyond these of a viewer of printable PDFs. If anyone is interested, I attach a test PDF of a similar aspect ratio to that discussed.
I would like to offer a bounty of 2000 USD for this feature
This feature is becoming more important to our company. I see that there were 16 issues raised that are related to this one. I was wondering if this deserves more attention?
Hoping that this bonus will help get this resolved for everyone interested.
PDF specifications include the concept of CrobBox. (TL;TR a way to crop the content to render)
I may missed something but : As pdfJs is compliant with this cropbox, I guess setting the cropBox to the tile dimension when rendering could do the job ?
According to the pdfJs code, the cropbox parsed from the pdf file ends in the array : page.view
See demo https://jsfiddle.net/mmouterde/s72wpgvL/
That said, the performance is not better on tile render than on full page render, but it could be used for tile generation or to render pdf on smaller canvas ?
@mmouterde you are quite mistaken about the nature of cropbox . When a document is printed, there are registration marks and stapling areas outside the "interesting area" which is for printed media - as I said, registration marks (to make sure layers of colours lined up), and extra magin for stapling. Those are of no interest to screen viewing and hence screen viewer should ignore things outside cropbox. It is not an arbitrary idea for cropping; it is a fairly well-defined set of numbers sometimes written inside certain pdfs, when they are intended for actual paper publication.
It is called cropbox for what it is, extra paper to be cut away at some point when you bind or staple the papers together.
Any plan about this issue? This limitation makes pdf.js unavailable in many scenarios, for example, zooming a pdf file, or opening a PDF exported by Safari. Also it affects lots of applications depend on pdf.js, such as Logseq. I think this issue should get a higher priority :)
I would really appreciate a solution to this as I enjoy looking at track maps of railway and subway systems. These are generally large images intended to be viewed at high zoom levels. For example, see the track map of the NY Subway below, which is https://www.vanshnookenraggen.com/_index/docs/NYC_full_trackmap.pdf
I would really appreciate a solution to this as I enjoy looking at track maps of railway and subway systems. These are generally large images intended to be viewed at high zoom levels. For example, see the track map of the NY Subway below, which is vanshnookenraggen.com/_index/docs/NYC_full_trackmap.pdf
A naive solution would be to simply set the transform of the render function to render the page to a smaller canvas. However, the performance is quite bad, as you can imagine. The render time increases linearly with the number of tiles.
Even if performance is not ideal, it seems still better than the current status. WDYT @calixteman @Snuffleupagus @timvandermeij?
Even if performance is not ideal, it seems still better than the current status.
Not really, since as already mentioned in https://github.com/mozilla/pdf.js/issues/6419#issuecomment-1837530552 this will affect performance quite badly in many cases: "The render time increases linearly with the number of tiles."
It might not look so bad in the demo above, but that's probably because that particular PDF document isn't all that "complex". Please consider the case where a page (currently) takes 2 seconds to render: If that's split into 10 sub-canvases, that same page now takes 20 seconds to finish rendering!
The explanation is that while each individual canvas indeed becomes smaller, that doesn't really help performance-wise since the OperatorList
is still the same and will be parsed (and rendered) in its entirety for each sub-canvas.
In order for this to work we'd need a way for the src/display/canvas.js
code to skip rendering instructions that are outside of the current sub-canvas, while still handling general graphic-state changes correctly. (This could perhaps be done a little similar to how disabled OptionalContent is skipped in src/display/canvas.js
.)
Other than potentially wasted CPU time, what is the downside if we use the current CSS zoom solution and replace it when the rendering is done? Isn't it still a net improvement?
Here's the PDF for future reference NYC_full_trackmap.pdf.
One thing to note is that the time taken to draw outside the canvas is significantly lower than drawing inside the canvas. This measurement is based on drawing 1,000,000 Bézier curves. Therefore, re-issuing the drawing command for every tile might not be as bad as it seems.
Here are some benchmark results:
Browser | Time Inside Canvas (ms) | Time Outside Canvas (ms) |
---|---|---|
Chrome | 3800 | 203 |
Safari | 53033 | 887 |
Firefox | 17266 | 778 |
I noticed the same in https://github.com/mozilla/pdf.js/pull/19128, where rendering the tile is much faster than rendering the whole. For a partial render, the JavaScript code significantly dominates the time spent drawing.
The problem is that large canvases take much memory space. It's visible if a PDF page is large (e.g. map) or zoomed in (e.g. at 800%+ zoom). Currently we are limiting canvas size (#4834) for mobile device. However a proper solution will be to divide page into smaller canvases and render only visible parts.
It's mostly blocked by generating operator list based on crop area (useful for zooming heavy maps), but we can proceed without it, and try to render the same operator list on several canvases.