lfoppiano / streamlit-pdf-viewer

Streamlit PDF viewer
https://structure-vision.streamlit.app/
Apache License 2.0
94 stars 6 forks source link

PDF is not shown when I set the rendering to 'legacy_iframe' or 'legacy_embed'. I want a scrollable and zoomable PDF within a frame. #40

Closed mingjun1120 closed 4 months ago

mingjun1120 commented 7 months ago

I want to create a page with 2 columns (st.columns([2, 1]) where the left column will display the PDF uploaded by users and the right column will allow users to ask questions (RetrievalQA, not Conversational type) related to the uploaded file. However, I have found that some of the files are not displayed on the web when I set the rendering parameter to legacy_iframe or legacy_embed. The unwrap parameter works fine; the PDF file can be displayed, but I have to scroll to the bottom of the entire page to see the file contents if the file has many pages.

Below is my code:

import base64
import streamlit as st
from streamlit_javascript import st_javascript
from streamlit_pdf_viewer import pdf_viewer

st.set_page_config(page_title="pdf-GPT", page_icon="📖", layout="wide")
st.write("<h2 style='text-align: center;'><span style='color: blue;'>TEST</span> Legal </h2>", unsafe_allow_html=True)

with st.sidebar:

    uploaded_file = st.file_uploader(
        "Upload file", type=["pdf"], 
        help="Only PDF files are supported", 
    )

col1, col2 = st.columns(spec=[2, 1])

if uploaded_file:
    with col1:
        ui_width = st_javascript("window.innerWidth")

        # Read file as bytes:
        bytes_data = uploaded_file.getvalue()

        # Embed PDF in HTML
        pdf_display = pdf_viewer(input=bytes_data, rendering='unwrap')

        # Display file
        st.markdown(pdf_display, unsafe_allow_html=True)

    with col2:
        question = st.text_input(
            "Ask something about the article",
            placeholder="Can you give me a short summary?",
            disabled = not uploaded_file,
        )

How the web looks like:

image

Sample file:

2024 EY Open Science Data Challenge Participant Guidance.pdf

lfoppiano commented 7 months ago

@mingjun1120 thanks you.

I confirmed that your PDF works fine with the rendering set to 'legacy_iframe' or 'legacy_embed'. You can use lfoppiano-document-qa.hf.space/ and select "Native rendering browser".

Then, checking your code I think that this:

        # Embed PDF in HTML
        pdf_display = pdf_viewer(input=bytes_data, rendering='unwrap')

        # Display file
        st.markdown(pdf_display, unsafe_allow_html=True)

you should rewrite it as:

pdf_viewer(input=bytes_data, rendering='unwrap')

or

pdf_viewer(input=bytes_data, rendering='legacy_iframe')
mingjun1120 commented 7 months ago

@lfoppiano No, it does not work. Even I replace to pdf_viewer(input=bytes_data, rendering='legacy_iframe').

Below is my updated code:

import base64
import streamlit as st
from streamlit_javascript import st_javascript
from streamlit_pdf_viewer import pdf_viewer

st.set_page_config(page_title="pdf-GPT", page_icon="📖", layout="wide")
st.write("<h2 style='text-align: center;'><span style='color: blue;'>TEST</span> Legal </h2>", unsafe_allow_html=True)

with st.sidebar:

    uploaded_file = st.file_uploader(
        "Upload file", type=["pdf"], 
        help="Only PDF files are supported", 
    )

col1, col2 = st.columns(spec=[2, 1])

if uploaded_file:
    with col1:
        ui_width = st_javascript("window.innerWidth")

        # Read file as bytes:
        bytes_data = uploaded_file.getvalue()

        # Embed PDF in HTML
        pdf_display = pdf_viewer(input=bytes_data, rendering='legacy_iframe')

        # Display file
        st.markdown(pdf_display, unsafe_allow_html=True)

    with col2:
        question = st.text_input(
            "Ask something about the article",
            placeholder="Can you give me a short summary?",
            disabled = not uploaded_file,
        )
mingjun1120 commented 7 months ago

I tried your app with the sample PDF file I sent earlier, the PDF was not shown also.

Screenshot: image

lfoppiano commented 7 months ago

Ah, sorry about that.

If you set the height to anything, you should be able to see the PDF. This is probably due to #35 , but I did not expect this to be happening also on the legacy rendering.

See:

image

As an alternative, could you try streamlit-pdf-viewer version 0.0.7 with your initial code (without setting the height) and let me know if that works?

lfoppiano commented 7 months ago

I tried your app with the sample PDF file I sent earlier, the PDF was not shown also. [...]

Oh this is weird... cause ... it worked for me:

image

mingjun1120 commented 7 months ago

Ah, sorry about that.

If you set the height to anything, you should be able to see the PDF. This is probably due to #35 , but I did not expect this to be happening also on the legacy rendering.

See:

image

As an alternative, could you try streamlit-pdf-viewer version 0.0.7 with your initial code (without setting the height) and let me know if that works?

Hmm, I tried the 0.0.7 version, but the same error still exists and I did not set any height also. It's weird! The code I used is same as the code I sent previously, the only thing was changed was the streamlit-pdf-viewer version.

lfoppiano commented 7 months ago

Hmm, I tried the 0.0.7 version, but the same error still exists and I did not set any height either. It's weird! The code I used is the same as the code I sent previously, the only thing that was changed was the streamlit-pdf-viewer version.

yeah, it's weird, I've been spending quite a substantial amount of time in the last weeks figuring out some weird behaviours. Anyway, if you set the height, does it work? I know it's not a solution, but at least you are not stuck, meanwhile, we fix this bug. 😅

mingjun1120 commented 7 months ago

I tried, still fail! 😥

Updated code:

import base64
import streamlit as st
from streamlit_javascript import st_javascript
from streamlit_pdf_viewer import pdf_viewer

st.set_page_config(page_title="pdf-GPT", page_icon="📖", layout="wide")
st.write("<h2 style='text-align: center;'><span style='color: blue;'>TEST</span> Legal </h2>", unsafe_allow_html=True)

with st.sidebar:

    uploaded_file = st.file_uploader(
        "Upload file", type=["pdf"], 
        help="Only PDF files are supported", 
    )

col1, col2 = st.columns(spec=[2, 1])

if uploaded_file:
    with col1:

        ui_width = st_javascript("window.innerWidth") + 20

        # Read file as bytes:
        bytes_data = uploaded_file.getvalue()

        # Embed PDF in HTML
        pdf_display = pdf_viewer(input=bytes_data, rendering='legacy_iframe', height=700)

        # Display file
        st.markdown(pdf_display, unsafe_allow_html=True)

    with col2:
        question = st.text_input(
            "Ask something about the article",
            placeholder="Can you give me a short summary?",
            disabled = not uploaded_file,
        )
lfoppiano commented 7 months ago

I tested your code with both 0.0.7 and 0.0.8, copied 1:1 and it works for me as long as I set the height. I'm running out of ideas, I'm afraid 😭

Maybe try to refresh your browser with ctrl+shift+r.

mingjun1120 commented 7 months ago

I tested the same code on my other laptop and my friend's laptop, and they all experienced the same issue.

lfoppiano commented 7 months ago

Noted. We will try to solve this shortly.

lfoppiano commented 7 months ago

@mingjun1120 Could you please let me know in which browser did you test it?

If you use Chrome/Chromium/Brave, the iframe/embed PDF loading is not running from within the iframe of the component. I did not know there were such limitations when we first implemented the feature.

You should use the classical approach discussed in many places including https://discuss.streamlit.io/t/display-pdf-in-streamlit/62274

Does this help?

mingjun1120 commented 6 months ago

@mingjun1120 Could you please let me know in which browser did you test it?

If you use Chrome/Chromium/Brave, the iframe/embed PDF loading is not running from within the iframe of the component. I did not know there were such limitations when we first implemented the feature.

You should use the classical approach discussed in many places including https://discuss.streamlit.io/t/display-pdf-in-streamlit/62274

Does this help?

Hi, I tested on both Chrome and Edge last time. Both also didn't work out.

lfoppiano commented 6 months ago

Yes, so Edge is not supported at the moment, but Chrome should be. I wonder if you could try it with Firefox.

lfoppiano commented 6 months ago

After investigating for a few weeks, we will deprecate the legacy methods, as there is no solution or workaround. If update to version 0.0.9 and specify the height you might have a smaller window, and the scrolling might be not necessary.

lfoppiano commented 5 months ago

FYI I'm considering deprecating the legacy methods because they cause too much trouble.

This issue has already been reported on streamlit app https://github.com/streamlit/streamlit/issues/1088#issuecomment-979400828 and https://github.com/streamlit/streamlit/issues/1088#issuecomment-829469172

lfoppiano commented 4 months ago

The fact that the legacy* methods do not work on Chrome and C is now documented. I think we can close this. @mingjun1120 feel free to reopen if you need anything else.