Open MLai0519 opened 2 months ago
Hey @MLai0519! I'm here to assist you with any bugs, questions, or contributions. Let's work together to resolve this issue.
To resolve the issue of failing to load a file using S3Reader
and SimpleDirectoryReader
with AWS S3, despite successfully testing the connection with boto3
and s3fs
, you should ensure that the SimpleDirectoryReader
is correctly configured with the necessary parameters. Here is a sample configuration:
from llama_index.core.readers import SimpleDirectoryReader
from s3fs import S3FileSystem
# Create the filesystem using s3fs
s3_fs = S3FileSystem(anon=False, key=getawsaccess('uat')[0], secret=getawsaccess('uat')[1], cache_regions=False)
# Initialize the SimpleDirectoryReader with the required parameters
reader = SimpleDirectoryReader(
input_dir='your-bucket-name/your-folder',
fs=s3_fs,
recursive=True, # Recursively searches all subdirectories
filename_as_id=True,
)
# Load the files as documents
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
# If you need to load files asynchronously
# import nest_asyncio
# nest_asyncio.apply()
# docs = await reader.aload_data()
# print(f"Loaded {len(docs)} docs")
Ensure that:
input_dir
parameter is correctly set to the name of your S3 bucket and folder.fs
parameter is set to an instance of S3FileSystem
with the correct credentials.recursive
parameter is set to True
if you want to search all subdirectories.Alternatively, you can use the BotoMinioReader
class from the LlamaIndex library, which is specifically designed to handle S3 file and directory reading using boto3
. Here is an example of how to use it:
from llama_index.readers.minio.boto3_client.base import BotoMinioReader
# Initialize the BotoMinioReader with the required parameters
reader = BotoMinioReader(
bucket='your-bucket-name',
prefix='your-folder', # Optional: to filter by prefix
aws_access_id=getawsaccess('uat')[0],
aws_access_secret=getawsaccess('uat')[1],
s3_endpoint_url='your-s3-endpoint-url',
)
# Load the files as documents
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Ensure that:
bucket
parameter is correctly set to the name of your S3 bucket.prefix
parameter is set if you want to filter by a specific prefix.aws_access_id
and aws_access_secret
) are correctly provided.s3_endpoint_url
is correctly set to your S3 endpoint URL.This configuration should help in resolving the issue of failing to load the file [1][2][3].
To continue talking to Dosu, mention @dosu.
Bug Description
Failed to load file with the readers. Tested the connection with boto3 and also s3fs alone, can reached the s3 bucket.
Version
0.10.54
Steps to Reproduce
I am using the below code, with llama-index-readers-s3 = 0.1.10, s3fs = 2024.6.1. Bucket is the target bucket and folder is just the subdirectory. With the same key and secret, I can use boto3 to access the bucket for upload and download using s3 client.