mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.25k stars 9.96k forks source link

Text of the PDF is messing up #11379

Closed thilinadinith closed 4 years ago

thilinadinith commented 4 years ago

Recently the PDF rendering get a messed up text layer where text gets duplicated with the grey colored overlay. No idea about how to fix it as lack of documentation over those functionalities Im using pdfjsViewer.PDFPageView and it gives this behavior now my code as follows

 getPdf() {

    var pdfDocument;

    if ( this._state !== 'inDOM' ) return false;

    pdfjsLib.disableRange = true;
    pdfjsLib.disableStream = true;

    let self = this;
    pdfDocument = pdfjsLib.getDocument(this.src);
    pdfDocument.promise.then(function(pdf) {
      self.set( 'pdfDocument', pdf );
      self.set( 'maxNumPages',  pdf.numPages );
      self.set( 'prevBtnDisabled', true );
      self.set( 'documentRendered', true );

      self.setViewportWidth();
      self.renderPdf();
    });

    return pdfDocument;
  },

  renderPdf() {

    var pdf = this.pdfDocument,
        maxNumPages,
        pagePromise;

    if ( !pdf ) return false;

    maxNumPages  = this.maxNumPages;

    pagePromise = this.getAndRenderPage( pdf, 1 );

    Array.apply( null, new Array( maxNumPages - 1 ) ).forEach( ( value, index ) => {

      pagePromise = pagePromise.then( () => this.getAndRenderPage( pdf, index + 2 ) );
    } );
  },

  getAndRenderPage( pdf, index ) {

    return pdf.getPage( index ).then( page => this.renderPage( page, index ) );
  },

  renderPage( pdfPage, pageNum ) {

    var parentWidth       = this.$().parent().width(),
        pageViewportScale = ( parentWidth >= this.get( 'breakpoints.mobile' ) ) ? 1.5 : 1.3,
        viewport          = pdfPage.getViewport( { scale: parentWidth / pdfPage.getViewport( { scale: pageViewportScale } ).width } ),
        container         = this.$().find( '.pdf_viewer--container' )[ 0 ],
        pdfPageView;

    pdfPageView = new pdfjsViewer.PDFPageView( {
      container: container,
      id: pageNum,
      scale: viewport.scale,
      defaultViewport: viewport,
     textLayerFactory: new pdfjsViewer.DefaultTextLayerFactory()

    } );
    var pages = this.get('pages');
    // Associates the actual page with the view, and drawing it
     pages.push( pdfPageView );
    this.set( 'pages', pages );
    this.set( 'scale', viewport.scale );

    pdfPageView.setPdfPage( pdfPage );

    return pdfPageView.draw();
  },

i have seen some of them asked this question in here. but the authors are closing those issues without giving a proper answer ! therefore please dont close these kinda of issue which users are facing.

Configuration:

Web browser and its version: Chrome 78.0.3904.108 Operating system and its version: Mac OS 10.15 PDF.js version:2.3.200 Is a browser extension: No

Steps to reproduce the problem:

  1. Code using pdfviewer.pdfpageview option

    renderPage( pdfPage, pageNum ) {
    
    var parentWidth       = this.$().parent().width(),
        pageViewportScale = ( parentWidth >= this.get( 'breakpoints.mobile' ) ) ? 1.5 : 1.3,
        viewport          = pdfPage.getViewport( { scale: parentWidth / pdfPage.getViewport( { scale: pageViewportScale } ).width } ),
        container         = this.$().find( '.pdf_viewer--container' )[ 0 ],
        pdfPageView;
    
    pdfPageView = new pdfjsViewer.PDFPageView( {
      container: container,
      id: pageNum,
      scale: viewport.scale,
      defaultViewport: viewport,
     textLayerFactory: new pdfjsViewer.DefaultTextLayerFactory()
    
    } );
    var pages = this.get('pages');
    // Associates the actual page with the view, and drawing it
     pages.push( pdfPageView );
    this.set( 'pages', pages );
    this.set( 'scale', viewport.scale );
    
    pdfPageView.setPdfPage( pdfPage );
    
    return pdfPageView.draw();
    },
  2. Open the PDF document

  3. grey colored text appears with main text ( duplication of same text in the page twice)

What is the expected behavior? correct pdf document to view ( it works when i removed textLayerFactory: new pdfjsViewer.DefaultTextLayerFactory() )

What went wrong? image

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension): None, Private Env

Snuffleupagus commented 4 years ago

The only suggestion here would be to refer to the "pageviewer" example in https://github.com/mozilla/pdf.js/tree/master/examples/components


However, this issue is currently missing all of required information necessary for it to be valid, and as-is it will be closed as INCOMPLETE.

First of all, you need to provide all of the details requested in https://github.com/mozilla/pdf.js/blob/master/.github/ISSUE_TEMPLATE.md; and please also see https://github.com/mozilla/pdf.js/blob/master/.github/CONTRIBUTING.md (emphasis mine):

If you are developing a custom solution, first check the examples at https://github.com/mozilla/pdf.js#learning and search existing issues. If this does not help, please prepare a short well-documented example that demonstrates the problem and make it accessible online on your website, JS Bin, GitHub, etc. before opening a new issue or contacting us on the IRC channel -- keep in mind that just code snippets won't help us troubleshoot the problem.

thilinadinith commented 4 years ago

Your Example is the one i have implemented also. but its not working fine. it gives the error of the issue that i mentioned above. anyways i will reformat the issue according to your specification. hope for a valid answer after that. cause i have seen this issue reported in several places ( even stackoverflow) . but you or anyone did not find a proper answer why it is happening after all.

Snuffleupagus commented 4 years ago

Please keep in mind that this is, first and foremost, an open source bug tracker and not a general support forum; hence everyone opening an issue are required to provide actionable information.

Given that the "pageviewer" example linked to above does work, that would suggest an error in your code (e.g. that you didn't include the pdf_viewer.css file in your HTML code); hence why https://github.com/mozilla/pdf.js/issues/11379#issuecomment-561066363 specifically asked you to provide a runnable example here.

thilinadinith commented 4 years ago

Thanks for the help. really appreciate it. anyways i have raised this in here as there was no migration document or any help for resolve this kind of issues. we are not experts on "PDFJS". we just use this as a library and we are configuring it with the parameters your library supports. issue was that the we were using 2.2.228 version and it upgraded ( minor ) with this breaking change. and went through your release logs but couldn't find the exact root cause which you mentioned here that the css file needs to be added.

anyways PDFjs is used in wide range infact, we are using this with EmberJS where we have customized a component to use your library. so the implementation from our side is not just adding a js or css file and its bit complicated. As a suggestion i would like to request to open a discussion or a help forum or community to support the issue that customers/users are experiencing. as i have search, i have some other people also faced issues and it is lack of support due to experts are not there in the environment( mentioned that most are use this just to render pdf or simple tasks)

Snuffleupagus commented 4 years ago

Thanks for the help. really appreciate it.

Sure; and a piece of future advice: Down-voting a simple request for more information, as was done with https://github.com/mozilla/pdf.js/issues/11379#issuecomment-561066363, is probably not the best way to get people to actually assist you...

anyways i have raised this in here as there was no migration document or any help for resolve this kind of issues.

You need to keep in mind that this is an open source project, with most people contributing in their spare time and without any compensation.

issue was that the we were using 2.2.228 version and it upgraded ( minor ) with this breaking change.

The versioning of the PDF.js library is explained in https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#version Looking at a general version number, i.e. x.y.z, then whenever the x and/or y part changes there's some API changes that you may need to account for in your code.

which you mentioned here that the css file needs to be added.

However, the fact that the "pageviewer" example requires the pdf_viewer.css file is nothing new, rather it has been included in that example ever since it was first created five years ago.

As a suggestion i would like to request to open a discussion or a help forum or community to support the issue that customers/users are experiencing.

Again, this is an open source project which you're able to use for free (thus the notion of "customers" doesn't fit here). Hence it's simply not reasonable to expect that unpaid volunteers will be able to provide the same level of support that you'd get with commercial software (for which you're paying).

Please understand that running a dedicated support forum is no easy task, and even if such a thing existed you'd not be guaranteed if/when questions were answered (since the few regular PDF.js contributors may not wish to spend their spare time acting as dedicated tech support).

timvandermeij commented 4 years ago

I agree with https://github.com/mozilla/pdf.js/issues/11379#issuecomment-561102452. There is no runnable example, and it looks like the text layer just isn't offset correctly due to missing/overwritten CSS. Without an actual running example there is unfortunately nothing we can do here.