google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.35k stars 120 forks source link

Auto-detect driver #116

Open normanrz opened 1 year ago

normanrz commented 1 year ago

Hi! Is there a way to auto-detect the driver (and maybe other parts of the spec) of an already existing dataset? I am thinking of

>>> dataset = ts.open({ 'kvstore': 'gs://neuroglancer-janelia-flyem-hemibrain/v1.1/segmentation/' }).result()
TensorStore({
  ...
  'driver': 'neuroglancer_precomputed',
  'dtype': 'uint64',
  'kvstore': {
    'bucket': 'neuroglancer-janelia-flyem-hemibrain',
    'driver': 'gcs',
    'path': 'v1.1/segmentation/',
  },
  ...
})
jbms commented 1 year ago

There isn't currently any format auto-detection logic, but it is something we discussed previously and I was inclined to implement that in conjunction with support for the URL syntax I proposed.

The syntax would probably be:

ts.open('gs://..') or ts.open({'driver': 'auto', 'kvstore': 'gs://...'})

If we have e.g. a zarr array at the root of a zip file (or similarly OCDBT database), then potentially that could also be auto-detected, e.g. gs://bucket/zipfile.zip would get auto-detected to gs://bucket/zipfile.zip|zip:|zarr3:. Not sure if auto-detection of kvstore adapter formats like that would also be supported, or just of the array formats.

In general the auto-detection would probably work by the various possible drivers specifying a set of relative paths to check and a number of header/trailer bytes required in case of a file rather than a directory.