justindujardin / pathy

simple, flexible, offline capable, cloud storage with a Python path-like interface
Apache License 2.0
170 stars 23 forks source link

Pass an instantiated client #66

Closed koshy1123 closed 8 months ago

koshy1123 commented 3 years ago

Hi there! I came across this package while working on a project, and love the idea. I'm curious if you've considered a feature enhancement where I can pass an instantiated client for S3 actions. Right now, AFAICT, it creates a client each time a new action is called. Similar to this section in smart_open

Under the covers, smart_open uses the boto3 client API to read from S3. By default, calling smart_open.open with an S3 URL will create its own boto3 client. These are expensive operations: they require both CPU time to construct the objects from a low-level API definition, and memory to store the objects once they have been created. It is possible to save both CPU time and memory by sharing the same resource across multiple smart_open.open calls

The use case I have is I'm trying to adapt an existing script to use some multithreading, but I run into issues when each thread is trying to create a client individually.

In general, is this direction a good idea? Does it make sense? Have I missed something obvious?

justindujardin commented 1 year ago

@koshy1123 are you still experiencing trouble with multiprocessing? Pathy caches the underlying "client" objects when they're created, so you should only have a single one regardless of the number of paths you interact with. It's probably true that you'd get an additional client per multiprocessing child because the cache is stored in private global variables.

I could consider adding support for sharing clients across multiple processes if you provide a snippet that captures your use-case to work from.

justindujardin commented 8 months ago

Going to close this as stale.