sdsc-ordes / modos-api

Python API to manage multi-omics digital objects
https://sdsc-ordes.github.io/modos-api
Apache License 2.0
0 stars 0 forks source link

feat: streamline remote options #90

Closed cmdoret closed 3 months ago

cmdoret commented 4 months ago

Objective:

Allows users to always specify the same modos server endpoint. Underlying htsget/s3 endpoints are auto-detected internally by querying the modos server.

For testing / advanced usage, a services option is still exposed in the API to bypass this and explicitely provide service endpoints.

Notes

Limitations

Questions should we use a more explicit mechanism for remote/local paths to avoid confusion? (e.g. --local flag or explicitely requiring s3:// paths for remote objects.). Should this be in this, or a separate PR?

Examples

CLI:

$ modos --version
modos 0.1.0

# create remote object on the s3 provided by the modos server at http://localhost
$ modos --endpoint http://localhost create bucket/object

# read endpoint from env variable to avoid repetition
$ export MODOS_ENDPOINT='http://localhost'
$ modos create bucket/object2
$ modos show bucket/object2
$ modos delete bucket/object2

API:

>>> from modos.api import MODO
>>> MODO('bucket/object2', endpoint='http://localhost')

# This is also possible, e.g. user has an s3 server, but no modos server
>>> nostream = MODO('bucket/ex-nostream', services={'s3': 'http://s3.example.org'})
>>> nostream.stream_genomics('demo1.cram')
ValueError: No htsget endpoint provided
cmdoret commented 4 months ago

TODO:

cmdoret commented 4 months ago

Added this based on our discussion:

With these changes, the examples above become:

CLI:

$ modos --version
modos 0.1.0

# create remote object on the s3 provided by the modos server at http://localhost
$ modos --endpoint http://localhost create s3://bucket/object

# read endpoint from env variable to avoid repetition
$ export MODOS_ENDPOINT='http://localhost'
$ modos create s3://bucket/object2
$ modos show   s3://bucket/object2
$ modos delete s3://bucket/object2

# we can operate on local objects when MODOS_ENDPOINT is set!
$ modos create data/object2

API:

>>> from modos.api import MODO
>>> MODO('s3://bucket/object2', endpoint='http://localhost')

# This is also possible, e.g. user has an s3 server, but no modos server
>>> nostream = MODO('s3://bucket/ex-nostream', services={'s3': 'http://s3.example.org'})
>>> nostream.stream_genomics('demo1.cram')
ValueError: No htsget endpoint provided