Closed akashAD98 closed 6 months ago
🤖
Hello @akashAD98, it's good to see you again! I hope you're doing well.
To install poppler on Google Colab, you can use the following command in a cell:
!apt-get install -y poppler-utils
This should install the poppler utility which is required for the operation you are trying to perform.
However, if you are still encountering the "Unable to get page count" error, it might be due to the poppler library not being able to read the PDF file correctly. This could be due to a variety of reasons such as the PDF being encrypted, corrupted, or not being fully downloaded.
As an alternative solution, you can use the PyPDF2 library to read the PDF file. Here is an example of how you can use it:
!pip install PyPDF2
import PyPDF2
def get_num_pages(pdf_path):
with open(pdf_path, "rb") as file:
pdf = PyPDF2.PdfFileReader(file)
return pdf.getNumPages()
pdf_path = "/path/to/your/pdf"
print(get_num_pages(pdf_path))
This will return the number of pages in the PDF file. If PyPDF2 is also not able to read the PDF file, then the issue is likely with the PDF file itself and not with the library you are using to read it.
As for the specific role of poppler in the LangChain codebase, particularly in the multi-modal RAG example, I wasn't able to find an answer within the repository. It's possible that poppler is being used to process PDF files in some way, but without more information, it's hard to say for sure. If you could provide more details about the error message you're seeing, it might help to narrow down the problem.
I had the same problem with XFA PDF
When installing poppler-utils I got a new error:
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7bb25f8173d0>
I am getting the same error too. Any updates on this?
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x452315ee0>
System Info
im running it on google Collab
Who can help?
trying the example of mult-modal rag - I tried everything no matter what if still getting this error please tell if if there is any alternative way or how can we install it? @bas
Information
Related Components
Reproduction
just run it on collab & we will not able to get output from partitions
https://github.com/langchain-ai/langchain/blob/master/cookbook/Multi_modal_RAG.ipynb
Expected behavior
it should work normal without error on collab