lfoppiano / streamlit-pdf-viewer

Streamlit PDF viewer
https://structure-vision.streamlit.app/
Apache License 2.0
89 stars 5 forks source link

Render Text issues when using tabs in Chrome #60

Open hamdi3 opened 1 month ago

hamdi3 commented 1 month ago

I've realized that when a pdf is run in tabs with render_text set to True it would only work for the selcted tab as the page loads

            paths = glob("../../*/*.pdf")
            for tab,path in zip(streamlit.tabs(paths),paths):
                with tab:
                    with streamlit.container(height=600):
                        pdf_viewer(path, width=800, render_text=True)

When trying to inspect it on the browser I've seen the following warnings on the console (for each document except the first one):

Warning: Setting up fake worker.
Warning: TT: undefined function: 32

As always thanks for the good work and support :)

lfoppiano commented 1 month ago

Hi @hamdi3, thanks for reporting back of a problem.

This is indeed weird. In one of the unit tests that I wrote I check also that the PDF on the second tab contains rendered text. πŸ€” Nevertheless, I tried with a code similar to your and it looks working. Are you sure the PDF does contains text? It could be that the PDF is an image PDF? Have you tried to change the order on which the PDF are loaded?

I did a test with 3 pdf documents. How many do you load?

image

The message Warning: Setting up fake worker. is a normal pdf.js warning. I don't think is harmless.

hamdi3 commented 1 month ago

I've tried again, whatever pdf I'm rendering in the selected tab, as the page loads, is the only one with rendered texts (I've tried it on all my pdfs iteratively so I'm sure it's not pdf Issue).

I've tried it with 2-3-10-15-20 pdfs and had the same issue.

I'm using Chrome Version 127.0.6533.89 and unfortunately can't test it on another browser, could this be a browser issue?

lfoppiano commented 1 month ago

OK. It's a problem with Chrome. 😭

lfoppiano commented 1 month ago

First answer about the underfined function is nothing to worry about, just due to fonts.

I'll look into the other part, maybe I need the help of @t29mato who's a JS and Chrome Guru! πŸ˜…

I'm trying to catch and ignore these exception when we call the page.render, but I'm surely doing something wrong:

    try {
        const renderTask = page.render(renderContext);
        await renderTask.promise.catch(function(error){
          // alertError(error);
          // do nothing
        });
      } catch (e) {
        // do nothing
      }
hamdi3 commented 1 month ago

@lfoppiano As always thanks for the hard work and support :)

t29mato commented 1 week ago

The PR https://github.com/mozilla/pdf.js/pull/18283 added the feature to support browsers' user-configurable minimum font size. However, Google Chrome's minimum font size becomes 0 when using Streamlit's tab functionality.

PDF.js implements the text dragging feature by overlaying transparent text on the PDF. But, when the font size is set to 0, there is no text to drag, causing the feature to fail.

Code

https://github.com/mozilla/pdf.js/pull/18283/files#diff-eb5220436f55f0c6ff687c15781981b323be818cb77e49d53d66f7f372f1f23dR475-R491 ensureMinFontSizeComputed method is supposed to return more than 1, but it returns 0 when the feature doesn't work.

Temporary Solution

Downgrading to the version 4.3.136 which does not support the minimum font size, restores the drag-and-drop functionality.

Additional Information

I want to explain why the custom pdf_viewer component works when RELEASE=FALSE.

When RELEASE=FALSE, the JavaScript runs each time you click on a tab. This lets it correctly get the minimum font size, even in Google Chrome.

But when RELEASE=TRUE, the JavaScript tries to get the minimum font size for all tabs as soon as the page loads, which can cause issues.

lfoppiano commented 5 days ago

@t29mato can you please try to make a patch that tries to force the text characters to be greater than zero?

t29mato commented 3 days ago

@lfoppiano

https://github.com/lfoppiano/streamlit-pdf-viewer/commit/db7a26d606100b3b98acd1f332e2eeb01e897d1a Okay, I made a patch branch. You need to do npm install before run the application

lfoppiano commented 3 days ago

@t29mato I made rc1 but it doesn't seems to work. @hamdi3 could you please give it a try to version 0.0.18-rc1?

t29mato commented 2 days ago

@lfoppiano I fixed it by downgrading the PDFjs version from 4.5.136 to 4.3.136, so updating it further won’t fix the issue. https://github.com/lfoppiano/streamlit-pdf-viewer/commit/db7a26d606100b3b98acd1f332e2eeb01e897d1a

lfoppiano commented 2 days ago

@t29mato that I understood :-) I made a release 0.0.18rc1 and tested here, but did not seems to work.

Update: you have to manually update to 0.0.18rc1

t29mato commented 2 days ago

@lfoppiano It doesn't work on 0.0.18rc1 because it specifies the PDF-js version as ^4.6.82 here. https://github.com/lfoppiano/streamlit-pdf-viewer/blob/v0.0.18-rc1/streamlit_pdf_viewer/frontend/package.json

Or am I misunderstanding something...

lfoppiano commented 2 days ago

mmm, maybe you pushed on a different branch than fix-problem-with-chrome?

t29mato commented 2 days ago

@lfoppiano Ah, yes πŸ’¦ I've pushed it as new one https://github.com/lfoppiano/streamlit-pdf-viewer/tree/fix/enable-text-content-on-chrome

lfoppiano commented 2 days ago

Ok. I've released rc2, tested it works on Chome. @hamdi3 could you please try it?