langchain-ai / langchain-google

MIT License
74 stars 78 forks source link

Multithreaded GCSDocumentStorage uploads and downloads #318

Open atognolag opened 1 week ago

atognolag commented 1 week ago

The GCSDocumentStorage now leverages:

  1. The google.cloud.storage.transfer_manager for multithreaded uploads and downloads of Blobs
  2. The google.cloud.storage.client.batch context manager to batch delete requests

A couple tests were added:

  1. Validation of user provided threading configurations
  2. Testing threaded and unthreaded operations specifically
  3. Tests now upload documents with characters from many different languages

Known limitations:

  1. The unthreaded uploads generate plain text documents in GCS, while the multithreaded one generates also plain text documents that GCS reports as application/octet-stream
lkuligin commented 1 week ago

/gcbrun

lkuligin commented 1 week ago

/gcbrun