wojtekmaj / react-pdf

Display PDFs in your React app as easily as if they were images.
https://projects.wojtekmaj.pl/react-pdf
MIT License
9.39k stars 887 forks source link

Performance issues when rendering large PDFs #94

Closed jesusgp22 closed 9 months ago

jesusgp22 commented 6 years ago

This might be a good question for pdf.js community itself but how does rendering large PDFs can be better handled with react-pdf?

pdf.js suggests not rendering more than 25 pages at a time: https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#allthepages

I even had to add this to my component to keep react from trying re-create the virtual DOM of the Document:

    shouldComponentUpdate(nextProps, nextState) {
        if(nextProps.file !== this.props.file
            || nextState.numPages !== this.state.numPages
            || nextState.width !== this.state.width){
            return true
        }
        return false
    }

The problem is that I also need to dynamically set the width of the document on user interacting so I can't save myself from re-creating the virtual DOM after width changes, any way I can achieve this with your lib?

zhoumy96 commented 1 year ago

this is a demo for large pdfs: https://github.com/zhoumy96/react-pdf-large-files Thanks ngoclinhng https://github.com/wojtekmaj/react-pdf/issues/94#issuecomment-875514687

wojtekmaj commented 1 year ago

I am a bit confused why the PDF.js viewer from Mozilla (https://mozilla.github.io/pdf.js/web/viewer.html) can load large PDF instantly, can zoom instantly, and you can scroll through the pages with minimal buffering. While using this library as is without performance optimization, large PDF's take at least 30 seconds to load, and I can't zoom at all because it makes the webpage freeze.

"without performance optimization" is the key here. React-PDF is NOT a PDF viewer - it is only a tool to build one. If you want to browse 100 page PDFs, you need to take similar precautions as if you were trying to open 100 images at once, or 100 videos, or whatever. You wouldn't open them all at once, would you?

It would be VERY helpful if you can start viewing a PDF without downloading the entire file first.

You can, as long as Range header is supported by the server you're serving the content from.

wojtekmaj commented 1 year ago

@zhoumy96 Good example. Rendering pages only when they are actually needed is a key for performant PDF viewer.

wojtekmaj commented 1 year ago

Here's my take on hooking React-PDF to React-Window.

https://codesandbox.io/s/react-pdf-react-window-x3xzzg https://codesandbox.io/s/react-pdf-react-window-fullscreen-ky4yy0

Moebits commented 1 year ago

It would be VERY helpful if you can start viewing a PDF without downloading the entire file first.

You can, as long as Range header is supported by the server you're serving the content from.

Would you mind giving an example of how to do this? I tried setting options={{disableAutoFetch: true, disableStream: true}} on the Document, but it seems like it has no effect. It is still downloading the whole file before it displays anything.

My PDF files are hosted in an AWS S3 bucket which does support range requests to my knowledge.

wojtekmaj commented 1 year ago

Hmm, not sure about that. I'm pretty sure PDF.js will request only as much data as needed, if it's possible, e.g. when you only want to display Page 1. If it's not happening, it's on PDF.js side. There may be something else that I don't know about that might prevent partial download from happening, e.g. PDF built in a specific way or something.

EricLiu0614 commented 1 year ago

Here's my take on hooking React-PDF to React-Window.

https://codesandbox.io/s/react-pdf-react-window-x3xzzg https://codesandbox.io/s/react-pdf-react-window-fullscreen-ky4yy0

@wojtekmaj Thank you for providing the great demo.

When I try to implement it in my application and load a large pdf file I notice the memory is keep increasing when I keep loading following pages or switch between pages. And the memory are not released until I close the browser tab.. Any idea how can we optimize it? Thank you!

Moebits commented 1 year ago

Hmm, not sure about that. I'm pretty sure PDF.js will request only as much data as needed, if it's possible, e.g. when you only want to display Page 1. If it's not happening, it's on PDF.js side. There may be something else that I don't know about that might prevent partial download from happening, e.g. PDF built in a specific way or something.

Ok, I figured out what the problem here was: The PDF files have to be "linearized", which means that they are saved in a way so that the file can be requested in chunks.

On a Mac, I just opened the PDF in Preview, reordered a page and put it back (otherwise it doesn't save if no changes are made), and hit File -> Save. It should save linearized by default. On Windows you will probably have to find a third party app to do it (like Acrobat).

I hope that helps anyone that was having the same issue.

HadesShadows commented 1 year ago

Here's my take on hooking React-PDF to React-Window.

https://codesandbox.io/s/react-pdf-react-window-x3xzzg https://codesandbox.io/s/react-pdf-react-window-fullscreen-ky4yy0

This is working perfectly. Thankyou so much

EDIT: Any help to make the navigation button work in these codes?

EDIT #2: So the solution to going to specific page is given by react-window documents React-window . Previous Page and next page can also be done similarly

ibweb3dev commented 1 year ago

Here's my take on hooking React-PDF to React-Window. https://codesandbox.io/s/react-pdf-react-window-x3xzzg https://codesandbox.io/s/react-pdf-react-window-fullscreen-ky4yy0

This is working perfectly. Thankyou so much

EDIT: Any help to make the navigation button work in these codes?

EDIT #2: So the solution to going to specific page is given by react-window documents React-window . Previous Page and next page can also be done similarly

I think still load the full PDF

if you pass in <Document... onLoadProgress={onDocumentLoadProgress}

  function onDocumentLoadProgress({ loaded, total }) {
    const tot = Math.round((loaded / total) * 100);

    console.log(tot);
  }

Logs will display the total load

srinadh239 commented 1 year ago

Here's my take on hooking React-PDF to React-Window. https://codesandbox.io/s/react-pdf-react-window-x3xzzg https://codesandbox.io/s/react-pdf-react-window-fullscreen-ky4yy0

@wojtekmaj Thank you for providing the great demo.

When I try to implement it in my application and load a large pdf file I notice the memory is keep increasing when I keep loading following pages or switch between pages. And the memory are not released until I close the browser tab.. Any idea how can we optimize it? Thank you!

@wojtekmaj @EricLiu0614 Any update on this. Even I had a similar problem, when i play around with scroll on huge pdfs, there seems to be a memory leak, which is causing page crash. Sample PDF I used, where the memory went up to 5 gb https://codesandbox.io/s/react-pdf-react-window-forked-jp5w5x?file=/src/App.js

Screenshot 2023-07-21 at 5 48 35 PM
admondtamang commented 1 year ago

It would be VERY helpful if you can start viewing a PDF without downloading the entire file first.

You can, as long as Range header is supported by the server you're serving the content from.

Would you mind giving an example of how to do this? I tried setting options={{disableAutoFetch: true, disableStream: true}} on the Document, but it seems like it has no effect. It is still downloading the whole file before it displays anything.

My PDF files are hosted in an AWS S3 bucket which does support range requests to my knowledge.

The browser does not expose Accept-Ranges and Content-Range by default. These two headers will cause pdf.js to mistakenly think that the server does not support range requests, and then directly request the entire file.

My CORS configuration for s3 bucket.

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "PUT",
            "GET",
            "POST",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "Accept-Ranges",
            "Content-Length",
            "Content-Range"
        ]
    }
] 

Expose these header to let react-pdf known about the headers that it need to stream

ccasper89 commented 1 year ago

Here's my take on hooking React-PDF to React-Window. https://codesandbox.io/s/react-pdf-react-window-x3xzzg https://codesandbox.io/s/react-pdf-react-window-fullscreen-ky4yy0

@wojtekmaj Thank you for providing the great demo. When I try to implement it in my application and load a large pdf file I notice the memory is keep increasing when I keep loading following pages or switch between pages. And the memory are not released until I close the browser tab.. Any idea how can we optimize it? Thank you!

@wojtekmaj @EricLiu0614 Any update on this. Even I had a similar problem, when i play around with scroll on huge pdfs, there seems to be a memory leak, which is causing page crash. Sample PDF I used, where the memory went up to 5 gb https://codesandbox.io/s/react-pdf-react-window-forked-jp5w5x?file=/src/App.js Screenshot 2023-07-21 at 5 48 35 PM

The memory leak seems to be due to code sandbox and not react-pdf + react-window example. Did you try to run the example locally?

KMJ-007 commented 1 year ago

any update ?