manzt / zarrita.js

A JavaScript toolkit for working with chunked, compressed, n-dimensional arrays
https://zarrita.dev
MIT License
39 stars 5 forks source link

Unifying consolidated and non-consolidated interfaces #117

Closed manzt closed 7 months ago

manzt commented 11 months ago

Right now there are two different worlds for consolidated vs non-consolidated metadata:

consolidated

import { openConsolidated, FetchStore } from "zarrita";

const store = new FetchStore("http://localhost:8080/data.zarr");
const { open, root, contents } = openConsolidated(store);
const grp = root();

const knownKey = contents.keys().next().value
const node = open(knownKey, { kind: "array" });
// just an alias for contents.get(knownKey) but type safe and allows relative paths

non-consolidated

import { open, root, FetchStore } from "zarrita";

const store = new FetchStore("http://localhost:8080/data.zarr");
const grp = await open(store);
const node = await open(grp.resolve("foo"));

But then the end users have to deal with switching between these worlds.

I can see a use case where we don't know if the metadata is consolidated or not, but if it is, it would be nice to avoid loading that metadata over the network (i.e., we have something like AnnData).

In #109 , @keller-mark had the idea to start tracking information about the stores (which we can do with WeakMaps). I wonder if we could extend this idea to opening consolidated metadata. I'm wondering if we could do something similar to keep track of the contents we've opened so far for a store:

import { openConsolidated, open, FetchStore } from "zarrita";

const store = new FetchStore("http://localhost:8080/data.zarr");
const contents = await openConsolidated(store);
// Map<AbsolutePath, Array<DataType, Store> | Group<Store>>

const grp = open(store, { kind: "group" }); // uses the consolidated metadata for the store
const store = open(grp.resolve("foo"), { kind: "array" });

This would mean that if you know you have consolidated metadata, you could just use contents directly. But if you don't know if it's consolidated or not, you could "try" to open consolidated for a performance boost:

await openConsolidated(store, { returnContents: false }); // creates a tracker of the consolidated metadata
const grp = open(store, { kind: "group" }); // uses the consolidated metadata for the store
const store = open(grp.resolve("foo"), { kind: "array" });
manzt commented 11 months ago

Another idea is that this could just be a special store (wrapper):

interface Listable {
  contents(): Array<{ path: AbsolutePath, kind: "array" | "group" }>;
}

async function withConsolidated<Store extends Readable>(store: Store) Pick<Store, "get"> & Listable {
  const known_metadata = await try_consolidated(store);
  return {
    get(...args: Parameters<Store["get"]>) {
      let [key, opts] = args;
      if (key in known_metadata) return known_metadata[key];
      let maybe_bytes = await store.get(key, opts);
      if (is_meta_key(key) && maybe_bytes) { // add to known_metadata
      return maybe_bytes;
    },
    contents() {
      return list_nodes(known_metadata);
    }
  }
}

The reason I chose to use contents over keys is because keys for Map would list all contents (including chunks).

Then openConsolidated could wrap withConslidated. Wonder what you think @keller-mark.


import { withConsolidated, FetchStore, open } from "zarrita";

let store = await withConsolidated(new FetchStore("http://localhost:8080"));
let contents = store.contents() // [ {path: "/", kind: "group" }, { path: "/foo", kind: "array" }, ...]
let foo = await open(contents[1].path, { kind: "array" });
manzt commented 11 months ago

I am experimenting with the second option.

manzt commented 7 months ago

Added in #119