tweedegolf / storage-abstraction

Provides an abstraction layer for interacting with a storage; the storage can be local or in the cloud.
MIT License
109 stars 18 forks source link

Revert format of configuration URLs #54

Closed abudaan closed 9 months ago

abudaan commented 10 months ago
          Both GCP and AWS seems to use a similar URI format for bucket identification:

GCP: gs://bucketname/path/to/object https://cloud.google.com/storage/docs/gsutil

AWS: s3://buckname/path/to/object https://docs.aws.amazon.com/cli/latest/userguide/cli-services-s3-commands.html#using-s3-commands-managing-objects-param

I understand extra params are required such as region, credentials etc... but the v1 formats handled that nicely.

Originally posted by @headlessme in https://github.com/tweedegolf/storage-abstraction/issues/46#issuecomment-1900366785

abudaan commented 10 months ago

Okay, I will revert to the old URL format and parser. I might change the format a bit. Do you think we should use @ for the bucket?

proto://credential1:credential2@bucket?region=us-west-1&param2=value2

abudaan commented 10 months ago

Proposal for config URLs, read here

headlessme commented 10 months ago

Mapping these config strings to URI components:

scheme = The storage type (s3, gs, local, azure etc...). I've used gs here to match the native Google format but gcs could be aliased to provide backwards compatibility with v1 of this library.

The authority breaks down to:

The path maps directly to the object path within the bucket

query stores any provider specific configuration (e.g. ?region=us-east-1)

fragment - unused

I think this matches pretty closely your existing implementation from v1 (apart from AWS region in the host, which I think is a bit inconsistent with the others)? I think this is also a superset of at least the basic GCP and AWS bucket URI formats that I know about, which is a nice property. I'm not that familiar with the other providers so I don't know if it would accommodate their existing formats too.

I'll look through the proposal doc in more detail to see if it matches this generic structure.

abudaan commented 10 months ago

Thanks for your input! I do agree that @region/bucketName is the odd one out here so I will implement what you suggested:

s3://access_key_id:secret_access_key@bucket_name?region=us-west-1&extra_option2=value2...

This way the configuration URLs can be consistent across all currently supported cloud storage services. And besides it simplifies the parser function :-)

headlessme commented 10 months ago

I'd expect these strings to be parseable with https://nodejs.org/api/url.html#new-urlinput-base as they'll be a standard format.

A quick test in node:

import { URL } from 'url'

const parsed = new URL('s3://id:key@mybucket/some/path/to/an/object?region=us-east-1')

console.log(parsed)

This outputs

URL {
  href: 's3://id:key@mybucket/some/path/to/an/object?region=us-east-1',
  origin: 'null',
  protocol: 's3:',
  username: 'id',
  password: 'key',
  host: 'mybucket',
  hostname: 'mybucket',
  port: '',
  pathname: '/some/path/to/an/object',
  search: '?region=us-east-1',
  searchParams: URLSearchParams { 'region' => 'us-east-1' },
  hash: ''
}
abudaan commented 10 months ago

I have implemented the standard URL parser but unfortunately this can't be used with the config URL of the local adapter and the GS adapter:

const u = "local://path/to/bucket@bucket_name?mode=511&extra_option2=value2"
{
  value: {
    type: 'local:',
    part1: null, // should be: path/to/bucket
    part2: null,
    bucketName: 'path',
    extraOptions: { mode: '511', extra_option2: 'value2' }
  },
  error: null
}

const u ="gcs://path/to/key_file.json@bucket_name?extra_option1=value1&extra_option2=value2";
{
  value: {
    type: 'gcs:',
    part1: null, // should be: path/to/key_file.json
    part2: null,
    bucketName: 'path',
    extraOptions: { extra_option1: 'value1', extra_option2: 'value2' }
  },
  error: null
}

Also the config URL doesn't (yet) use the path to an object in the storage; you need to call a getFile method after you've created a Storage instance.

It might be worth considering whether or not we should add support for direct URL's to objects in the storage. This would imply that if a path to an object is detected in the config URL, the Storage automatically calls a getFile method after instantiation.

headlessme commented 10 months ago

The path could be treated as a default object prefix for any later calls to any of the getFile methods?

abudaan commented 9 months ago

fixed in 2.1.0