tweedegolf / storage-abstraction

Provides an abstraction layer for interacting with a storage; the storage can be local or in the cloud.
MIT License
106 stars 18 forks source link

Api2.1 #46

Closed abudaan closed 7 months ago

abudaan commented 9 months ago

I have drafted a new API and I am open to suggestions from the community.

You can find the draft in the change.log

Thanks!

headlessme commented 9 months ago

Would you consider splitting this module into a number of different modules to minimise the dependencies to only those required (e.g. for us that would be GCP and AWS buckets). We'd like to use a module like this, but it relies on outdated / unmaintained libraries that have active security vulnerabilities (e.g. https://github.com/yakovkhalinsky/backblaze-b2, https://github.com/yakovkhalinsky/backblaze-b2/pull/132), so it's going to be hard to adopt. We don't have any requirement to support b2 or minio so would ideally not install those modules at all.

We could then import the "core" module:

import 'storage-abstraction'

and any providers we want to support:

import 'storage-abstraction-gcp'
import 'storage-abstraction-s3'

If that's not on the roadmap, then maybe there's a simpler solution via optional dependencies (https://docs.npmjs.com/cli/v10/configuring-npm/package-json#optionaldependencies) that would avoid depending directly on the unrequired storage provider libraries.

jracabado commented 8 months ago

+1 to @headlessme suggestion, this would highly increase the maintainability of this module consumers as it would make the life simpler for this module maintainers.

When checking the Axios CVE I also stumbled into this similar suggestion: https://github.com/tweedegolf/storage-abstraction/issues/50

headlessme commented 8 months ago

I have just read through the migration document. I had a query on some of the design decisions for the new API, particularly around how the bucket name is used. It looks as though I can specify the bucket name in the storage URL still (which is great, that's how we want to use it as it keeps our config very simple as just a single url), but then the strange bit, it looks like I also have to re-specifcy it in every call on the storage adapter? This seems a bit weird and unfriendly as I already specified it. Maybe I've misunderstood the change?

I'd expect that if I've specified the bucket in the URL of the storage setup then that's the ONLY bucket I can access through that particular storage instance (credentials are likely tied to that bucket anyway). At the very least it should be used as the default if I omit the bucket name in later storage API calls. If I don't specify the bucket in the storage constructor then it makes sense to require it on every later API call.

Secondly the changes in the URL format, was there a reason behind diverging away from the more known s3://buckname/path/to/object type URIs that the AWS cli supports?

headlessme commented 8 months ago

Both GCP and AWS seems to use a similar URI format for bucket identification:

GCP: gs://bucketname/path/to/object https://cloud.google.com/storage/docs/gsutil

AWS: s3://buckname/path/to/object https://docs.aws.amazon.com/cli/latest/userguide/cli-services-s3-commands.html#using-s3-commands-managing-objects-param

I understand extra params are required such as region, credentials etc... but the v1 formats handled that nicely.