run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.59k stars 5.23k forks source link

[Bug]: NameError: name 'BytesIO' is not defined. Did you mean: 'bytes'? #16082

Closed SwAt1563 closed 1 month ago

SwAt1563 commented 1 month ago

Bug Description

Bug Description:

While attempting to use the SimpleDirectoryReader to read images from an AWS S3 bucket, I encountered a NameError related to BytesIO. The error occurs because BytesIO is not imported in the llama_index.readers.file.image.base file.

Code:

reader = SimpleDirectoryReader(
    input_dir=f'{bucket_name}/tests',
    fs=s3_fs,
    recursive=False,
    raise_on_error=True,
)

Error Traceback:

Traceback (most recent call last):
  File "/home/qutaiba/.local/lib/python3.12/site-packages/llama_index/core/readers/file/base.py", line 540, in load_file
    docs = reader.load_data(input_file, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/qutaiba/.local/lib/python3.12/site-packages/llama_index/readers/file/image/base.py", line 85, in load_data
    image = Image.open(BytesIO(f.read()))
                       ^^^^^^^
NameError: name 'BytesIO' is not defined. Did you mean: 'bytes'?

Cause of the Issue:

The error occurs because BytesIO is not imported in the file llama_index/readers/file/image/base.py. This prevents the library from properly handling image data read from S3.

Suggested Fix:

To resolve this issue, the following import statement needs to be added to the file llama_index/readers/file/image/base.py:

from io import BytesIO

Version

0.11.9

Steps to Reproduce

The AWS S3 should contain some images.

s3_fs = S3FileSystem(
            anon=False, # false: required to use IAM
            client_kwargs={
                "region_name": region_name,
            },
        )

# will read each pdf and return each page as a document
reader = SimpleDirectoryReader(
    # filename_as_id=True, # If True, use the filename as the document ID
    input_dir=f'{bucket_name}/tests',
    # input_files=[f'{bucket_name}/tests/ticket_to_ride.pdf'], # If None, load all files in the directory
    fs=s3_fs,
    recursive=False, # If True, recursively search for files in subdirectories
    raise_on_error=True, # If True, raise an error if a file cannot be loaded

)   

documents = reader.load_data()

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

Hello, @SwAt1563! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

To resolve the NameError related to BytesIO in the llama_index/readers/file/image/base.py file, you need to add the following import statement:

from io import BytesIO

This will allow the BytesIO class to be recognized and used for handling image data read from S3. Here is the updated import section for the file:

import re
from pathlib import Path
from typing import Dict, List, Optional, cast, Any
from fsspec import AbstractFileSystem
from io import BytesIO  # Add this line

from llama_index.core.readers.base import BaseReader
from llama_index.core.schema import Document, ImageDocument
from llama_index.core.utils import infer_torch_device

This should fix the NameError and allow the SimpleDirectoryReader to properly handle images from an AWS S3 bucket [1].

To continue talking to Dosu, mention @dosu.