Performance issues when rendering large PDFs

jesusgp22 commented 6 years ago

This might be a good question for pdf.js community itself but how does rendering large PDFs can be better handled with react-pdf?

pdf.js suggests not rendering more than 25 pages at a time: https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#allthepages

I even had to add this to my component to keep react from trying re-create the virtual DOM of the Document:

    shouldComponentUpdate(nextProps, nextState) {
        if(nextProps.file !== this.props.file
            || nextState.numPages !== this.state.numPages
            || nextState.width !== this.state.width){
            return true
        }
        return false
    }

The problem is that I also need to dynamically set the width of the document on user interacting so I can't save myself from re-creating the virtual DOM after width changes, any way I can achieve this with your lib?

michaeldzjap commented 6 years ago

@jesusgp22 You probably want to use some kind of virtualization library for displaying PDF's with a lot of pages, like react-virtualized for instance. Maybe this is useful to you.

jesusgp22 commented 6 years ago

Hey, thank you so much for your answer, I'll def check this out

You might want to add a note about this on react-pdf documentation to help others with the same performance issues or even in the future add this as a core feature for large docs.

jesusgp22 commented 6 years ago

Following up on this @michaeldzjap I am watching some presentations on react-virtualized and it will break text search feature, is this a trade off that I can't get around?

michaeldzjap commented 6 years ago

I am not familiar with the text search feature I have to admit. But I suspect that it relies on the text layer for each page to be rendered in order to be able to find all the relevant results for a specific search query (e.g. a word that could be located anywhere in the document). The whole point of virtualizing a collection of elements (Page components in the case of react-pdf) is to not render them all at the same time.

I don't think there is an easy way around this unfortunately. A solution could be to keep a virtual representation of a text layer of each page in memory (like how React does this for HTML elements) and search through that instead. Might be possible.

jesusgp22 commented 6 years ago

That's an interesting approach, I am guessing this will most likely break browser text search feature anyway, in any case I think it is ok to implement this using just a regular search box element. Now the questions are:

How can I extract the text from the pdf to keep a "virtual" copy of the whole text layer I can search from
After getting a list of results from the text how can I implement a feature to seek these features in the document (guessing I will need to map results to scrollbar coordinates accordingly)

michaeldzjap commented 6 years ago

How can I extract the text from the pdf to keep a "virtual" copy of the whole text layer I can search from

I think you would need to dig into pdf.js for this, relying on the react-pdf api probably is not enough. You can get the text content for a page using this apparently:

page.getTextContent().then(function(textContent) { ... });

After getting a list of results from the text how can I implement a feature to seek these features in the document (guessing I will need to map results to scrollbar coordinates accordingly)

Yes, that is a tricky one... You'd know the page number. Maybe it should be a 2 step operation or something. 1 - Search through the virtual text layers for a query. Keep a result of all pages that match. 2 - For each page in the result of step 1 see if it is rendered, if it is you can probably find the word rather easily, because I think each word is rendered as a separate HTML element in a text layer. If the page is not rendered yet, scroll to it with react-virtualized so that it will be rendered and then again find the HTML element that contains the first occurrence of the word/query in the text layer element tree.

Something like the above. I might think too simplistic about this, I haven't actually tried it myself. But this is how I would approach things initially I think.

jesusgp22 commented 6 years ago

I was wondering if the biggest performance issue was rendering the text layer or the canvas, in case rendering the canvas is an issue, it might be possible to ask pdf.js to only render the text layer? I know this is not possible with the current react-pdf API

wojtekmaj commented 6 years ago

@jesusgp22 Nope, you can toggle text content and annotations on/off, but canvas is currently not behind a flag. I don't see a good reason against having it toggleable, though :)

wojtekmaj commented 6 years ago

I think you would need to dig into pdf.js for this, relying on the react-pdf api probably is not enough.

@michaeldzjap Any reason for this? Documents's onLoadSuccess should return all pdf properties and methods, and Page's onLoadSuccess should return all page properties and methods.

If you use Document, you can get the number of pages, iterate through all of them with pdf.getPage(pageNumber) and run the code you pasted on getPage()'s results.

michaeldzjap commented 6 years ago

@wojtekmaj Yes, my wording was rather poor. What I meant is that pdf.getPage(), page.getTextContent() etc. all are pdf.js related rather than react-pdf specific. So although of course all those methods are perfectly well accessible through the react-pdf API, they really belong to the underlying pdf.js API.

If you use Document, you can get the number of pages, iterate through all of them with pdf.getPage(pageNumber) ...

Yes. This is exactly what I do to cache all document page widths and height on initial load when using react-pdf together with react-virtualized.

jesusgp22 commented 6 years ago

Thank you both for this amazing discussion 👍

MarcoNicolodi commented 6 years ago

We are also having trouble loading long PDFs. We are loading a 17mb PDFs and the application crashes, and since we have customers with 100mb+ PDFs, crashing is not an option.

This example which is also a react wrapper to PDF.js seem to work for us. It tricks PDF.js to load only the current visible page and the ten previous pages. It looks like it has something to do with the wrapper div's styles, because when you change some of the styles it loses it lazy loading behaviour.

I couldnt reproduce this trick to your lib. But we liked react-pdf so much that we are still trying to adapt this lazy load trick to it.

We like the fact that your lib has no default toolbox and that it has mapped its props to pdf.js handlers/configs, so we can develop our customized toolbox.

So we would be glad to see it working better with long pdfs, maybe using this trick that yurydelendik/pdfjs-react uses (thats a shame that I couldnt reproduce it with your lib! )

jesusgp22 commented 6 years ago

@MarcoNicolodi I found that react-virtualized worked really bad with react-pdf I implemented the aproach to only render a few pages but to make things work you have to render a div that has the dimensions of the pages you don't render

you can 100% integrate this with react-pdf using the document object that is returned by react-pdf and use getPage and page.getViewport methods to get the page dimensions

I built my own algorithm to detect what pages are visible and I run it everytime the user scrolls or if a resize event happens.

wojtekmaj commented 6 years ago

Hey everyone, I'd like to remind you that it was never React-PDF's intention to provide the users with fully-fledged PDF reader. Instead, this is only a tool to make it. While I have a plan of creating React-PDF-based PDF reader, I'm far from it. Mozilla is working on it for years and they seem to never be done. I think it would go similar way ;)

There is some good news too, though. If I can suggest something, onRenderSuccess callback that you can define for <Page> components can be your powerful friend. You can use it to, for example, force pages to be rendered one by one:

import React, { Component } from 'react';
import { Document, Page } from 'react-pdf/build/entry.webpack';

import './Sample.less';

export default class Sample extends Component {
  state = {
    file: './test.pdf',
    numPages: null,
    pagesRendered: null,
  }

  onDocumentLoadSuccess = ({ numPages }) =>
    this.setState({
      numPages,
      pagesRendered: 0,
    });

  onRenderSuccess = () =>
    this.setState(prevState => ({
      pagesRendered: prevState.pagesRendered + 1,
    }));

  render() {
    const { file, numPages, pagesRendered } = this.state;

    /**
     * The amount of pages we want to render now. Always 1 more than already rendered,
     * no more than total amount of pages in the document.
     */
    const pagesRenderedPlusOne = Math.min(pagesRendered + 1, numPages);

    return (
      <div className="Example">
        <header>
          <h1>react-pdf sample page</h1>
        </header>
        <div className="Example__container">
          <div className="Example__container__document">
            <Document
              file={file}
              onLoadSuccess={this.onDocumentLoadSuccess}
            >
              {
                Array.from(
                  new Array(pagesRenderedPlusOne),
                  (el, index) => {
                    const isCurrentlyRendering = pagesRenderedPlusOne === index + 1;
                    const isLastPage = numPages === index + 1;
                    const needsCallbackToRenderNextPage = isCurrentlyRendering && !isLastPage;

                    return (
                      <Page
                        key={`page_${index + 1}`}
                        onRenderSuccess={
                          needsCallbackToRenderNextPage ? this.onRenderSuccess : null
                        }
                        pageNumber={index + 1}
                      />
                    );
                  },
                )
              }
            </Document>
          </div>
        </div>
      </div>
    );
  }
}

Of course you can do much more - add placeholders, check on scroll which pages need rendering, keep info on whether all pages so far were rendered... I believe in your creativity ;) And if I can be of any help regarding API, please let me know!

MarcoNicolodi commented 6 years ago

@jesusgp22

Hey, may you share this example?

jesusgp22 commented 6 years ago

@MarcoNicolodi yes, I think it can even be included as a PR to react-pdf at some point I don't have the time to share the code right now but I will later today.

wojtekmaj commented 6 years ago

I think the right place for that is Wiki. :) I highly encourage you to add your experiences on a new page there.

crapthings commented 6 years ago

i've try to make react-pdf work with react-virtualized, but failed. it always print render cancel.

so i make a minimal demo with original pdf.js with react-virtualized, its pretty fast, weird.

https://github.com/crapthings/react-pdf-viewer/blob/master/client/pdf.js

wojtekmaj commented 6 years ago

@crapthings I think it may be something related to how react-virtualized measures rows before final rendering. It may unmount a component after initial measurement. I'm no expert in react-virtualized but perhaps you could force height or otherwise somehow disable these measurements? If something forces Page to unmount itself, it will cancel rendering. Should retry rendering on mounting again though 🤔

fetacore commented 6 years ago

Well I want to contribute to the discussion because a project of mine depends on this awesome library and I found a way to implement lazy-loading with no extra dependencies. The only caveat is that I lost the natural scrolling behavior but the user can still scroll and change pages. The logic is that the visible component is just one page and there is another component with display='none' which renders more pages as needed. I needed an implementation where I could keep track programmatically of which page is visible right now, and with the "infinite" scrolling behavior it was a pain in the ass to implement. The customization is left to the developer (scroll threshold, how many more pages etc). Let me know what you think.

import React from 'react';
import { Document, Page } from 'react-pdf';

const pdfjs = require('pdfjs-dist/build/pdf.min.js');

pdfjs.PDFJS.workerSrc = '../src/assets/pdf.worker.min.js';
pdfjs.PDFJS.cMapUrl = '../src/assets/cmaps/';
pdfjs.PDFJS.cMapPacked = true;

export default class App extends React.Component {
  constructor(props) {
    super(props);
    this.state = {
      numPages: null,
      pageIndex: null,
      binaryPDFContent: somebase64string,
    }
  }

  componentDidMount() {
    this.PDFWidth = 400;
    document.getElementById('pdfContainer').addEventListener('wheel', this.onScrollPDF.bind(this));
  }

  componentWillUnmount() {
    document.getElementById('pdfContainer').removeEventListener('wheel', this.onScrollPDF.bind(this));
  }

  onDocumentLoadSuccess(nPages) {
    if (this.state.pageIndex==null) {
      this.setState({
        numPages: nPages,
        pageIndex: 0,
      });
    } else if (this.state.pageIndex > nPages) {
      this.setState({
        numPages: nPages,
        pageIndex: nPages-1,
      })
    } else {
      this.setState({
        numPages: nPages,
      });
    }
  }

  onScrollPDF(event) {
    let delta = null;
    if (event.wheelDelta) {
      delta = event.wheelDelta;
    } else {
      delta = -1 * event.deltaY;
    }
//  This is where some customization can happen
    if (delta < -20) {
      this.nextPage()
    } else if (delta > 10) {
      this.previousPage()
    }
  }

  previousPage() {
    if (this.state.pageIndex > 0) {
      this.setState({
        pageIndex: this.state.pageIndex-1
      })
    }
  }

  nextPage() {
    if (this.state.pageIndex+1 < this.state.numPages) {
      this.setState({
        pageIndex: this.state.pageIndex+1
      })
    }
  }

  render() {
    let PDFContainerHeight = 600;
    return (
      <div
        id="pdfContainer"
        style={{width:this.PDFWidth, height:PDFContainerHeight, overflow:'hidden'}}
        >
        <Document
          file={{data:`data:application/pdf;base64,${this.state.binaryPDFContent}`}}
          onLoadSuccess={(pdf) => {
            this.onDocumentLoadSuccess(pdf.numPages)
          }}
          className="pdfPreview"
          rotate={0}
          >
           <Page
             key={`page_${this.state.pageIndex + 1}`}
             width={this.PDFWidth}
             pageNumber={this.state.pageIndex + 1}
             className="pdfPage"
             renderMode="svg"
            />
            <FakePage
              //  This is where we can customize how many pages we need cached
              pages={Math.min(this.state.numPages, this.state.pageIndex+20)}
              width={this.PDFWidth}
             />
          </Document>
       </div>
    )
  }
}

class FakePage extends React.Component {
  constructor(props) {
    super(props)
  }
  render() {
    return(
      <div style={{display: 'none'}}>
        {
          Array.from(
            new Array(this.props.pages),
            (el, index) => (
              <Page
                key={`page_${index + 1}`}
                width={this.props.width}
                className="pdfPage"
                renderMode="svg"
                pageNumber={index + 1}
              />
            ),
          )
        }
      </div>
    )
  }
}

wojtekmaj commented 6 years ago

That is pretty damn sweet! Does it work with React-PDF 2.x? I'd be genuinely surprised, think 3.0.0 would be the first version to handle all of that correctly!

fetacore commented 6 years ago

I did not try it with 2. Lately I use your alpha release (which is awesome btw and I haven't got any errors yet). Still I could not find a way to make the infinite scrolling behavior and keep track of which page is visible. When I tried to apply this solution to your example I got pages with very big and variable gaps between them. I also could not trigger the caching of the next pages (since I could not get the visible page). I believe with a little time and patience a solution can be found that does not involve external libraries.

wojtekmaj commented 6 years ago

Have you tried react-virtualized as other folks here suggested?

fetacore commented 6 years ago

I tried it but did not get very far. I believe that it would give unnecessary overhead for my app and (for my specific use case) it still did not solve the problem of having programmatic access to which page is visible. tbh I did not spend much time on it as the fakepage trick struck me right after I installed the library :P If I find the time I will try to find the solution to the problem and close this issue once and for all.

michaeldzjap commented 6 years ago

@fetacore I initially went the custom route, but after a lot of trying different things out I settled on react-virtualized. That was quite a pain in the ass to get to play nicely with react-pdf, but I did manage to get it to work and it made things a hell of a lot easier in the end.

it still did not solve the problem of having programmatic access to which page is visible

That is definitely possible with react-virtualized. I needed this functionality in my app as well. You can use the onRowsRendered() function for that.

Lukars commented 6 years ago

Hey, @fetacore @jesusgp22 @wojtekmaj and all friends of react-pdf! Has anyone gotten further on the subject? I really appreciate the discussion so far, and would love to hear your latest ideas, thanks!

nikonet commented 5 years ago

@Lukars I've gotten react-pdf to play nicely with react-window (a newer and more compact version of react-virtualized).

I used the VariableSizeList which can window different page heights/widths.

On Document documentLoadSuccess, I call a function inherited from a parent component, setPageHeights, which caches all pageHeights (scaled dynamically to parent component height).

I then pass pageHeights as a prop to the child again, and there pass it into the itemSize prop of the VariableSizeList like itemSize={pageHeights[index]}.

When resizing or zooming in/out, I call the setPageHeights again with the updated scale and parent container, and use the VariableSizeList method resetAfterIndex(0) after updating pageHeights to clear and update the internally cached row heights in the VariableSizeList.

I also made a very small PureComponent for wrapping Page with a div wrapping it. This div then get's passed the styles from react-window. I think this part is pretty crucial for performance. On the built solution it looks pretty much 100% smooth even with scrolling very fast with 150 page + PDFs.

abelmark commented 5 years ago

@nikonet care to post an example or a link to the file you are implement react-window with? Thanks!

stefanbugge commented 5 years ago

I've had success with rendering with react-pdf together with react-window. The implementation below is inspired by the react-virtualized implementation by @michaeldzjap above and the description provided by @nikonet. It's still a work in progress but so far it seems to perform well. Any suggestions to improve the implementation would be greatly appreciated.

One thing that concerns me, however: By caching all page dimensions on document load I would assume that you would loose the ability of pdfjs to load pages in chunks with range requests. Any thoughts on this?

import React from 'react'
import PropTypes from 'prop-types'
import { debounce } from 'lodash'

import { VariableSizeList as List } from 'react-window'
import { Document } from 'react-pdf/dist/entry.webpack'

import PageRenderer from './PageRenderer'
import PlaceholderPageList from './PlaceholderPageList'

import { PAGE_SPAZING } from './../constants'

/* eslint-disable import/no-webpack-loader-syntax */
import testpdf from 'url-loader!./../testpdf.pdf'
import './../style.scss'

const file = {
  url: testpdf
}

const propTypes = {
  scale: PropTypes.number.isRequired
}

// PDFjs options
const options = {}

class DocumentViewer extends React.Component {
  static propTypes = propTypes

  constructor (props) {
    super(props)

    this.state = {
      containerWidth: undefined,
      containerHeight: undefined,
      numPages: undefined,
      currentPage: 1,
      cachedPageDimensions: null
    }

    this.viewerContainerRef = React.createRef()
    this.listRef = React.createRef()
  }

  componentDidMount () {
    this._mounted = true
    this.calculateContainerBounds()
    window.addEventListener('resize', this.handleWindowResize, true)
  }

  componentWillUnmount () {
    this._mounted = false
    window.removeEventListener('resize', this.handleWindowResize, true)
  }

  componentDidUpdate (prevProps) {
    if (prevProps.scale !== this.props.scale) {
      this.recomputeRowHeights()
    }
  }

  /**
   * Load all pages to cache all page dimensions.
   */
  cachePageDimensions (pdf) {
    const promises = Array.from({ length: pdf.numPages }, (v, i) => i + 1).map(
      pageNumber => pdf.getPage(pageNumber)
    )

    // Assuming all pages may have different heights. Otherwise we can just
    // load the first page and use its height for determining all the row
    // heights.
    Promise.all(promises).then(values => {
      if (!this._mounted) {
        return null
      }

      const pageDimensions = values.reduce((accPageDimensions, page) => {
        accPageDimensions.set(page.pageIndex + 1, [
          page.view[2],
          page.view[3] + PAGE_SPAZING
        ])
        return accPageDimensions
      }, new Map())

      this.setState({
        cachedPageDimensions: pageDimensions
      })
    })
  }

  calculateContainerBounds = () => {
    if (this.viewerContainerRef == null) {
      return
    }
    const rect = this.viewerContainerRef.current.getBoundingClientRect()
    this.setState({
      containerWidth: rect.width,
      containerHeight: rect.height
    })
  }

  recomputeRowHeights = () => {
    this.listRef.current.resetAfterIndex(0)
  }

  /*
    HANDLERS
  */

  onDocumentLoadSuccess = pdf => {
    this.setState({
      numPages: pdf.numPages
    })
    this.cachePageDimensions(pdf)
    this.calculateContainerBounds()
  }

  handleWindowResize = debounce(() => {
    this.calculateContainerBounds()
  }, 300)

  updateCurrentVisiblePage = ({ visibleStopIndex }) => {
    this.setState({
      currentPage: visibleStopIndex + 1
    })
  }

  /*
    GETTERS
  */

  getItemSize = index => {
    return this.state.cachedPageDimensions.get(index + 1)[1] * this.props.scale
  }

  /*
    RENDERERS
  */

  render () {
    const { 
      numPages, 
      cachedPageDimensions,
      containerHeight
    } = this.state

    const itemData = {
      scale: this.props.scale,
      cachedPageDimensions: cachedPageDimensions
    }
    return (
      <div className='dv' ref={this.viewerContainerRef}>
        <Document
          className='dv__document'
          file={file}
          onLoadSuccess={this.onDocumentLoadSuccess}
          options={options}
          loading={<PlaceholderPageList />}
        >
          {cachedPageDimensions != null && (
            <List
              height={containerHeight}
              itemCount={numPages}
              itemSize={this.getItemSize}
              itemData={itemData}
              overscanCount={2}
              onItemsRendered={this.updateCurrentVisiblePage}
              ref={this.listRef}
            >
              {PageRenderer}
            </List>
          )}
        </Document>
      </div>
    )
  }
}

export default DocumentViewer

//////////////////////////////////////////////////

import React from 'react'
import PropTypes from 'prop-types'

import { Page } from 'react-pdf/dist/entry.webpack'

const propTypes = {
  index: PropTypes.number.isRequired,
  style: PropTypes.object.isRequired,
  data: PropTypes.object.isRequired
}

export default class PageRenderer extends React.PureComponent {
  static propTypes = propTypes

  render () {
    const { index, data } = this.props
    const { cachedPageDimensions, scale } = data

    const pageNumber = index + 1
    const pageDimensions = cachedPageDimensions.get(pageNumber)

    const width = pageDimensions[0] * scale
    const style = {
      ...this.props.style,
      width,
      left: '50%',
      WebkitTransform: 'translateX(-50%)',
      transform: 'translateX(-50%)'
    }
    return (
      <div
        className='dv__page-wrapper'
        key={`page_${pageNumber}`}
        style={style}
      >
        <Page
          className='dv__page'
          pageNumber={pageNumber}
          scale={scale}
          renderAnnotationLayer={false}
        />
      </div>
    )
  }
}

abelmark commented 5 years ago

Amazing. I ended up getting it working, but noticed some room for improvement on mine after viewing yours. I appreciate you sharing.

ianzen commented 5 years ago

I want to contribute a bit to this discussion as this library helped me a great deal. I was able to get react-window to work well with react-pdf. The trick is to wrap the page inside a div with its style set by list. stefanbugge's code seems to have implemented this but was not explicitly mentioned. An important thing to note is that the list props need to be fine tuned according to specific use cases. Refer to react-window for detailed information.

          <Document
            file={this.props.file}
            onLoadSuccess={this.onDocumentLoadSuccess}
            options={options}
            noData=""
            loading=""
          >
            {this.state.loaded ? 
                <List
                    height={this.state.height}
                    width={this.state.width}
                    itemSize={this.state.itemScale * this.state.width}
                    itemCount={this.state.numPages}
                >
                    {({ style, index }) => (
                        <div style={style}>
                            <Page
                                pageNumber={index + 1}
                                width={this.state.width}
                                renderAnnotationLayer={false}
                                loading=""
                            ></Page>
                        </div>
                    )}
                </List>:
            null}
          </Document>

michaeldzjap commented 5 years ago

For anyone interested, I've updated my original example using in turn @stefanbugge's implementation. It now uses react-window instead of react-virtualized (as the latter doesn't seem to be actively developed anymore). See this.

alex-mironov commented 4 years ago

I don't want to load all the pages at once as it requires downloading the whole pdf. Instead, I'm relying on the progressive load, render pages with some default value and afterwards re-rendering them as measurements arrive with the help of resetAfterIndex:

  handlePageLoaded = (page) => {
    const { cachedPageDimensions } = this.state

    const cachedPage = cachedPageDimensions.get(page.pageIndex)
    if (cachedPage.isLoaded) return

    cachedPageDimensions.set(page.pageIndex, {
      height: page.height,
      width: page.width,
      isLoaded: true
    })

    this.setState({ cachedPageDimensions }, () => {
      // invalidating cached pages dimensions as they could have been affected by just loaded page
      // TODO check if dimensions have really changed
      this.document.current.resetAfterIndex(page.pageIndex)
    })
  }

ssteuteville commented 4 years ago

I was able to get a pretty simple solution working using an intersection observer to track which page is currently in view and then shifting the rendered pages based on that..

This isn't a complete working example but hopefully it provides enough context to explain the solution:

PdfPage A wrapper around Page:

  const intersectionOptions: IntersectionObserverInit = {
    root: null,
    threshold: getIntersectThreshold(context.zoom) // function that optimizes threshold based on zoom
  };
  const [setNode, latestIntersection] = useIntersect(intersectionOptions);
  useLayoutEffect(() => {
    if (
      latestIntersection &&
      latestIntersection.isIntersecting &&
      !firstEntry.current
    ) {
      context.setCurrentPage(pageNumber);
    }

  }, [context, latestIntersection, isRendered, pageNumber]);

  return (
      <div ref={setNode}>
        <Page
          pageNumber={pageNumber}
          scale={zoom}
          loading=""
        />
        {children}
      </div>
  );

PdfForm a wrapper around Document

  const getPageRange = (currentPage: number, buffer: number, numPages: number) => {
    const start = Math.max(currentPage - buffer, 0);
    return Array.from(Array(buffer * 2).keys()).map(i => start + i).filter(i => i <= numPages);
  };
  const context = useContext(PdfViewerContext);
  const pageIndices = getPageRange(context.currentPage, 15, context.pageCount);
  const zoom = useZoomContext();

  return (
    <div className={classes.pdfContainer}>
      <Document
        key={name}
        file={url}
        loading={loadingPlaceholder}
        onLoadSuccess={e => context.setPageCount(e.numPages)}
      >
        {pageIndices.map(pageIndex => (
          <PdfPage key={pageIndex} pageNumber={pageIndex + 1} />
         )}
      </Document>
    </div>

useIntersect a hook wrapper around IntersectionObserver

const defaultIntersectionOptions: IntersectionObserverInit = {
  root: null,
  rootMargin: '0px%',
  threshold: 0.33
};

export const useIntersect = ({
  root,
  rootMargin,
  threshold
}: IntersectionObserverInit = defaultIntersectionOptions): [
  (instance: HTMLDivElement | null) => void,
  IntersectionObserverEntry | null
] => {
  // the latest IntersectionObserverEntry
  const [entry, updateEntry] = useState<IntersectionObserverEntry | null>(null);
  // the node we are observing
  const [node, setNode] = useState<Element | null>(null);
  // the intersection observer
  const observer = useRef<IntersectionObserver | null>(null);

  // side effect of any of these changing: [node, root, rootMargin, threshold]
  useEffect(() => {
    // avoid application crash in IE
    if (!('IntersectionObserver' in window)) return;

    if (observer.current) {
      // if we are observing, stop. (avoid leaks)
      observer.current.disconnect();
    }

    observer.current = new IntersectionObserver( // create a new observer, with the updated params.
      ([entry]) => updateEntry(entry),
      {
        root,
        rootMargin,
        threshold
      }
    );

    const { current: currentObserver } = observer;

    if (node) {
      // if there is a node, observe it
      currentObserver.observe(node);
    }

    return () => currentObserver.disconnect(); // stop observing when the component unMounts
  }, [node, root, rootMargin, threshold]);

  return [setNode, entry]; // return the node setter and latest entry
};

export default useIntersect;

mschluper commented 3 years ago

https://gist.github.com/JacobFischer/aecbd871cb2aae46993236f65797da5c JacobFischer/lazy-download-pdf-button.jsx

may help.

Also, https://github.com/jimmywarting/StreamSaver.js

jayhoogle commented 3 years ago

Hey @alex-mironov, could you share how you got the progressive loading to work? I have a 117MB PDF hosted on a server that accepts byte ranges, but I can't seem to get react-pdf to show anything until the whole document is downloaded, because onLoadSuccess only runs once, when that download has completed, meaning react-pdf shows its loading state until that point.

Any tips would be greatly appreciated.

carlaiau commented 3 years ago

My Naive Implementation

Using react-intersection-observer. The vertical height of the DOM needs to be retained because I have overlaying Dom elements and need to preserve the ability to fast scroll to a page number much further down than the one currently viewed.

This is been used via a web-view within a React Native application and when page count exceeded ~150 the pages just stopped getting rendered. Now they work again :). Thanks @ssteuteville for inspo

import React from 'react'
import { Page } from 'react-pdf/dist/esm/entry.webpack'
import { useInView } from 'react-intersection-observer'

const WrappedPage = ({ scale, pageNumber, originalHeight, originalWidth }) => {
    const { ref, inView } = useInView()

    return (
        <div ref={ref}>
            {inView ? (
                <Page
                    pageNumber={pageNumber}
                    scale={scale}
                    renderAnnotationLayer={false}
                    renderInteractiveForms={false}
                    renderTextLayer={false}
                />
            ) : (
                <div style={{ width: originalWidth * scale + 'px', height: originalHeight * scale + 'px' }} />
            )}
        </div>
    )
}

export default WrappedPage

and the main view

<Document
      file={this.state.documentsInBase64[i]}
      onLoadSuccess={props => {
          const { numPages } = props
          this.onDocumentLoadSuccess({
              numPages,
              docIndex: i,
          })
      }}
  >
      {[...Array(this.state.pageCounts[i])].map((_, pageIndex) => (
      <div
         style={{ position: 'relative' }}
         key={pageIndex + '-div'}>
            <WrappedPage
                pageNumber={pageIndex + 1}
                scale={this.state.scale}
                key={pageIndex + '-page'}
                originalWidth={this.state.originalWidth}
                originalHeight={this.state.originalHeight}
            />
      </div>
      ))}
</Document>

Zloka commented 3 years ago

My Naive Implementation

Using react-intersection-observer. The vertical height of the DOM needs to be retained because I have overlaying Dom elements and need to preserve the ability to fast scroll to a page number much further down than the one currently viewed.

This is been used via a web-view within a React Native application and when page count exceeded ~150 the pages just stopped getting rendered. Now they work again :). Thanks @ssteuteville for inspo

import React from 'react'
import { Page } from 'react-pdf/dist/esm/entry.webpack'
import { useInView } from 'react-intersection-observer'

const WrappedPage = ({ scale, pageNumber, originalHeight, originalWidth }) => {
    const { ref, inView } = useInView()

    return (
        <div ref={ref}>
            {inView ? (
                <Page
                    pageNumber={pageNumber}
                    scale={scale}
                    renderAnnotationLayer={false}
                    renderInteractiveForms={false}
                    renderTextLayer={false}
                />
            ) : (
                <div style={{ width: originalWidth * scale + 'px', height: originalHeight * scale + 'px' }} />
            )}
        </div>
    )
}

export default WrappedPage

and the main view

<Document
      file={this.state.documentsInBase64[i]}
      onLoadSuccess={props => {
          const { numPages } = props
          this.onDocumentLoadSuccess({
              numPages,
              docIndex: i,
          })
      }}
  >
      {[...Array(this.state.pageCounts[i])].map((_, pageIndex) => (
      <div
         style={{ position: 'relative' }}
         key={pageIndex + '-div'}>
            <WrappedPage
                pageNumber={pageIndex + 1}
                scale={this.state.scale}
                key={pageIndex + '-page'}
                originalWidth={this.state.originalWidth}
                originalHeight={this.state.originalHeight}
            />
      </div>
      ))}
</Document>

Hey! I'm trying to do something similar, and I was wondering if you found a way to easily load the originalWidth and originalHeight, or does that come from some metadata of your own?

jmcdl commented 3 years ago

@Zloka You can use the onLoadSuccess function in Page or Document to get the viewport for the page / pages. I've done it in the Document component because I want to create an array of the dimensions so that I can have a placeholder div of the right size for each of my not yet rendered pages. If I knew all pages would have the same dimensions I could just get them for the first page and use them for all other pages.

  const getPageDimensions = (page: PDFPageProxy) => {
    const viewport = page.getViewport({ scale: 1 });
    const viewportRatio = viewport.width / viewport.height;
    let pageDimensions;
    // the Page component takes either a width or height prop. If the page is
    // wider than it is high, we want to scale the page using it's width,
    // otherwise use it's height
    if (viewport.width > viewport.height) {
      pageDimensions = {
        width: windowSize.width ? windowSize.width * 0.6 : null,
        height: null,
      };
    } else {
      pageDimensions = {
        width: null,
        height: windowSize.height ? windowSize.height * 0.7 : null,
      };
    }
    return { viewportRatio, pageDimensions };
  };

  // use this function to get number of pages and page dimensions before rendering
  const onDocumentLoadSuccess = async (pdf: PDFDocumentProxy) => {
    const numPages = pdf.numPages;
    setNumPages(numPages);
    // Get the dimensions for each page individually
    const newViewportRatioArray = [];
    const newPageDimensionsArray = [];
    for (let i = 0; i < numPages; i++) {
      const page = await pdf.getPage(i + 1);
      const { viewportRatio, pageDimensions } = getPageDimensions(page);
      newViewportRatioArray.push(viewportRatio);
      newPageDimensionsArray.push(pageDimensions);
    }
    setViewportRatioArray(newViewportRatioArray);
    setPageDimensionsArray(newPageDimensionsArray);
  };

Thanks to @carlaiau for the snippet that got things working with react-intersection-observer.

ngoclinhng commented 3 years ago

First of all, thanks for so many great ideas from those who came before me.

Here is my idea that would render 1000 pages (35.2 MB) with ease.

Idea

We need a Placeholder component for each page in the document. This Placeholder will be presented to user when either the page is not in view or the page is still loading. Ideally the dimensions (width, height) of this Placeholder component should match the dimensions of each page it represents. But we cannot afford to do that, since that would require to load 1000 pages in order to determine their dimensions (using onLoadSuccess callback on the Page component, for example). Since in my implementation, the width of each page in the document has to be 100% the width of the container, the width of the Placeholder is easy, just the container width, and the height is computed using width * a4AspectRatio - where a4AspectRatio is the aspect ratio of a typical A4 document (typically 1.4142).
Use IntersectionObserver as a window to keep track of which pages are in view and which are out of view. To do that, we use an visibilities boolean array of size numPages. Initially (after the onLoadSuccess on the Document has been called), this visibilities array is populated with all false. When a page becomes visible, the entry corresponding to the page index in the visibilities array is set to true, and when it becomes invisible, that entry is set to false.
Pseudocode
```
const a4AspectRatio = 1.4142;
```

function Placeholder({ width }) { const height = a4AspectRatio * width; return <div style={{width: width, height: height}} />; }

const PdfPage = React.forwardRef(function PdfPage({visible, pageIndex, width}, ref) { const placeholder = ();

return ( <div ref={ref} data-page-index={pageIndex}

{visible ? ( <Page renderAnnotationLayer={false} width={width} pageNumber={pageIndex + 1} style={{ maxWidth: '100%', // This is not a valid inline CSS!!! '& > canvas': { maxWidth: '100%', height: 'auto !important' } }} loading={placeholder} /> ) : placeholder}

); });

function PdfPreview({ src }) { const [numPages, setNumPages] = React.useState(0); const rootRef = React.useRef(null); const [observer, setObserver] = React.useState(null); const [visibilities, setVisibilities] = React.useState([]);

const handleDocumentLoadSuccess = ({ numPages }) => { setNumPages(numPages); setVisibilities(Array.from(new Array(numPages), () => false)); };

// Ref to each page in the document. const pageRefs = React.useMemo(() => { return Array.from(new Array(numPages), () => React.createRef()); }, [numPages]);

// Initialize Intersection Observer. React.useEffect(() => { // By tweaking the rootMargin you can control how many pages are visible // at the same time. Loosely speaking if you set rootMargin to xpx 0px, // there will be at most (containerHeight + 2 * x )/ pageHeight visible pages. const observerOptions = { root: rootRef.current, rootMargin: '600px 0px', threshold: 0.0 };

const observerCallback = (entries, io) => {
  const intersects = entries.reduce((acc, entry) => {
    const pageIndex = parseInt(entry.target.getAttribute('data-page-index'));
    return {
      ...acc,
      [pageIndex]: entry.isIntersecting
    };
  }, {});

  setVisibilities((prev) => prev.map((visible, index) => {
    if (intersects.hasOwnProperty(index)) { return intersects[index]; }
    return visible;
  }));
};

const io = new IntersectionObserver(observerCallback, observerOptions);
setObserver(io);

return () => io.disconnect();

}, []);

// Start observing pages. React.useEffect(() => { if (observer) { pageRefs.forEach((page) => observer.observe(page.current)); } }, [observer, pageRefs]);

return ( <div ref={rootRef} style={{ width: '100%', maxHeight: 600, overflow: 'auto', }}

{({ size }) => ( } style={{width: '100%'} {visibilities.map((visible, index) => ( ))} )}

ramzitannous commented 2 years ago

What is the status here, still facing slow performance with the latest version @wojtekmaj

ramzitannous commented 2 years ago

Already tried to load pages in chunks one page at a time, but performance slowness is still there @wojtekmaj

ngoclinhng commented 2 years ago

@ramzitannous Could you please elaborate a little bit more on "tried to load pages in chunks one page at a time". Did you mean you have some kind of pagination mechanism, so that:

You load and render the first page of your (large) pdf document.
User "clicks" on page X, you load it and render it...and that's it?

ramzitannous commented 2 years ago

Hey everyone, I'd like to remind you that it was never React-PDF's intention to provide the users with fully-fledged PDF reader. Instead, this is only a tool to make it. While I have a plan of creating React-PDF-based PDF reader, I'm far from it. Mozilla is working on it for years and they seem to never be done. I think it would go similar way ;)

There is some good news too, though. If I can suggest something, onRenderSuccess callback that you can define for <Page> components can be your powerful friend. You can use it to, for example, force pages to be rendered one by one:

import React, { Component } from 'react';
import { Document, Page } from 'react-pdf/build/entry.webpack';

import './Sample.less';

export default class Sample extends Component {
  state = {
    file: './test.pdf',
    numPages: null,
    pagesRendered: null,
  }

  onDocumentLoadSuccess = ({ numPages }) =>
    this.setState({
      numPages,
      pagesRendered: 0,
    });

  onRenderSuccess = () =>
    this.setState(prevState => ({
      pagesRendered: prevState.pagesRendered + 1,
    }));

  render() {
    const { file, numPages, pagesRendered } = this.state;

    /**
     * The amount of pages we want to render now. Always 1 more than already rendered,
     * no more than total amount of pages in the document.
     */
    const pagesRenderedPlusOne = Math.min(pagesRendered + 1, numPages);

    return (
      <div className="Example">
        <header>
          <h1>react-pdf sample page</h1>
        </header>
        <div className="Example__container">
          <div className="Example__container__document">
            <Document
              file={file}
              onLoadSuccess={this.onDocumentLoadSuccess}
            >
              {
                Array.from(
                  new Array(pagesRenderedPlusOne),
                  (el, index) => {
                    const isCurrentlyRendering = pagesRenderedPlusOne === index + 1;
                    const isLastPage = numPages === index + 1;
                    const needsCallbackToRenderNextPage = isCurrentlyRendering && !isLastPage;

                    return (
                      <Page
                        key={`page_${index + 1}`}
                        onRenderSuccess={
                          needsCallbackToRenderNextPage ? this.onRenderSuccess : null
                        }
                        pageNumber={index + 1}
                      />
                    );
                  },
                )
              }
            </Document>
          </div>
        </div>
      </div>
    );
  }
}

Of course you can do much more - add placeholders, check on scroll which pages need rendering, keep info on whether all pages so far were rendered... I believe in your creativity ;) And if I can be of any help regarding API, please let me know!

@ngoclinhng did something like this, but still having slow performance

maks-dlp commented 2 years ago

@ngoclinhng when we load the pdf of more than 200pages, web page becomes unresponsive.

ngoclinhng commented 2 years ago

@ramzitannous @maks-dlp First of all, make sure we're on the same page.

This is NOT what causes the problem:

// Just loading a pdf document and asking "Hey buddy, how many pages do you have?" 
// is NOT an expensive operation at all as far as concerned.
function PageCount({ file }) {
  const [pageCount, setPageCount] = React.useState(null);

  const handleLoadSuccess = ({ numPages }) => setPageCount(numPages);

  return (
    <Document
       file={file}
       onLoadSuccess={handleLoadSuccess}
    >
       <div>Num pages: {pageCount}</>
    </Document>
  );
}

Just try it out with a 1000 pages PDF document to see what i mean.

What really causes the problem is that you guys try to render TOO MANY pages AT THE SAME TIME! Rendering just one page is itself an expensive operation, let alone 100 pages document.

Before discussing the solution i want to point out to @ramzitannous why it's slow event you did follow the example. Let's say you want to render 100 pages document. The state after the document successfully loaded is this numPages = 100, pagesRendered = 0. And therefore pagesRenderedPlusOne = 1. And you render it like this:

Array.from(
   new Array(pagesRenderedPlusOne),
   (el, index) => {
      const isCurrentlyRendering = pagesRenderedPlusOne === index + 1;  // true
      const isLastPage = numPages === index + 1; // false
      const needsCallbackToRenderNextPage = isCurrentlyRendering && !isLastPage;  // true
      return (
        <Page
           key={`page_${index + 1}`}
           onRenderSuccess={
              needsCallbackToRenderNextPage ? this.onRenderSuccess : null
            }
            pageNumber={index + 1}
          />
        );
     }
)

The first page is loaded and rendered with no problem at all. But what happens after that is really a big problem. onRenderSuccess of the first page get invoked, and what it does is to increment pagesRendered by 1, which means now your state is this numPages = 100, pagesRendered = 1. And therefore pagesRenderedPlusOne = 2:

No need to load the first page, BUT we still have to draw canvas for it.
Load and render the second page

I hope that, at this point, you see the end result of this process: you did render 100 pages all at once!

Solution

We know the root cause of the problem. The way to resolve it is to find a way to render no more than, say 16 pages at the same time. You could achieve this by using some kind of virtualization as mentioned by other people somewhere above or give my "solution" a try.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 14 days.

Moebits commented 2 years ago

I am a bit confused why the PDF.js viewer from Mozilla (https://mozilla.github.io/pdf.js/web/viewer.html) can load large PDF instantly, can zoom instantly, and you can scroll through the pages with minimal buffering. While using this library as is without performance optimization, large PDF's take at least 30 seconds to load, and I can't zoom at all because it makes the webpage freeze.

I think that @ngoclinhng solution performs the best, with it I can load large PDF's faster but there is still a delay whenever I zoom. At least I am able to zoom though...

One more optimization that helped me was setting renderMode to svg instead of canvas, it decreases the delay when zooming a bit.

However I think that there are some serious performance issues with this library since it is much slower than PDF.js by itself. I hope you can figure out why this is performing much slower than vanilla PDF.js.

Edit: It would be VERY helpful if you can start viewing a PDF without downloading the entire file first.

mcgeld commented 2 years ago

I was able to successfully implement a virtualized pdf view for larger pdfs. I basically just made a scroll listener that calculates the position of the current page, then it updates an array that contains the numbers of the pages that should be rendered. That array is contained in the state of my component, so when it changes, it updates what pages are rendered. It works quite well with 20 pages being rendered at a time and allowing the user to be unaware that previous pages are being removed from the DOM and additional pages are being added. If anyone's interested in seeing my implementation, let me know and I can try to make a concise version to paste here.

codemasterlike commented 1 year ago

I was able to successfully implement a virtualized pdf view for larger pdfs. I basically just made a scroll listener that calculates the position of the current page, then it updates an array that contains the numbers of the pages that should be rendered. That array is contained in the state of my component, so when it changes, it updates what pages are rendered. It works quite well with 20 pages being rendered at a time and allowing the user to be unaware that previous pages are being removed from the DOM and additional pages are being added. If anyone's interested in seeing my implementation, let me know and I can try to make a concise version to paste here.

Yes, i am interested in, appreciate your help to share. Thanks.

wojtekmaj / react-pdf