langchain-ai / langchain-aws

Build LangChain Applications on AWS
MIT License
63 stars 47 forks source link

Implementation of BlobLoader with S3 #18

Open eyurtsev opened 2 months ago

eyurtsev commented 2 months ago

An implementation of BlobLoader with S3 would be fantastic. It'll allow users to easily hook their s3 data to langchain parsers and indexing code.

Here's an explanation about the abstractions: https://python.langchain.com/docs/modules/data_connection/document_loaders/custom/#blob-loaders

The minimal feature set for this to be useful:

  1. filters by keys matching a given prefix
  2. filter for last_modified_date
  3. filter for max file size
  4. filter for file extension
  5. filter for min file size

Additional features:

  1. if available filter for file mimetype