lfoppiano / streamlit-pdf-viewer

Streamlit PDF viewer
https://structure-vision.streamlit.app/
Apache License 2.0
94 stars 6 forks source link

Add filter to document rendering to show only specific pages #22

Closed Vinno97 closed 7 months ago

Vinno97 commented 8 months ago

It would be very nice to be able to tell the viewer to open the PDF on a specific page or section (anchor). This enables more interactivity between the streamlit app and the PDF

lfoppiano commented 8 months ago

HI @Vinno97 and thanks for your interest in this component. I will try to give you an answer, however we are learning streamlit on the way with our development.

We have been thinking about opening the PDF in a new window (#21), instead of in an iframe within the application window or tab, however, as with other features (e.g. getting the height/width of the iframe) there are quite a lot of limitations.

I think, at the moment you can do that by placing the pdf_viewer() call in the corresponding section (for example under a st.column() or a st.component()).

Are you aware of other components that can be opened in a targeted page or section?

Vinno97 commented 8 months ago

Thanks for your quick response! I think there may be a misunderstanding. What I meant, was that it would be nice to have the ability to have the PDF viewer focus its view on a specific section of the PDF.

For example, I now did something like this as a workaround:

from io import BytesIO

import streamlit as st
from pypdf import PdfReader, PdfWriter
from streamlit_pdf_viewer import pdf_viewer

# Get the PDF
pdf = st.file_uploader("PDF Report")
if pdf is None:
    st.stop()

# Let the user choose a page number
pagenum = st.number("Page number")

# Create a new one-page PDF of the selected page
reader = PdfReader(pdf)
writer = PdfWriter()

writer.add_page(reader.pages[pagenum])

new_pdf = BytesIO()
writer.write(new_pdf)
new_pdf.seek(0)

# Show the new PDF
pdf_viewer(new_pdf.read(), height=800)
lfoppiano commented 7 months ago

Hi @Vinno97, thanks for the clarification, now I understand.

What we could try to do, is to verify whether it is possible to switch to a specific page when calling pdf_viewer(...). However, I'm not sure we can control the javascript in this way, so your workaround might be the only solution.

@t29mato maybe you can check whether it's feasible to pass an additional parameter page and see if we can scroll down automatically after we load the pdf?

lfoppiano commented 7 months ago

@Vinno97 about your example, once the modified PDF is visualised it needs to be re-built again if the user wants other pages. Do you have a specific use case for this kind of feature?

Vinno97 commented 7 months ago

Yes my workaround is really crude. I wouldn't advocate for this being the actual solution. If you guys agree that allowing the user to focus on a specific page is a nice feature, I think something like this would be better.

My use case was that I was debugging PDF data extraction logic for some very large documents. I had the PDF viewer in one streamlit column and some extracted information in another column. The code I sent before allowed me to quickly focus on the exact page that corresponded to an element I was interested in. Otherwise i would have had to scroll through 100 pages until I found it.

lfoppiano commented 7 months ago

@Vinno97 Thanks to @t29mato who has implemented a selective rendering on the javascript should be more efficient. It's released in version 0.0.7 (please skip version 0.0.6 😅). You can see it in action here.

Vinno97 commented 7 months ago

Thanks, guys! Made quick work of it 😄

lfoppiano commented 7 months ago

I close the issue. Feel free to reopen or comment if needed.