gzuidhof / zarr.js

Javascript implementation of Zarr
https://guido.io/zarr.js
Apache License 2.0
133 stars 23 forks source link

[Feature]: Support for S3 with AccessKey and SecretKey #123

Open jhnnsrs opened 2 years ago

jhnnsrs commented 2 years ago

Hi!

First thanks for the great work, especially for the effort in types! As a lot of the community uses S3 for storage, I was wondering if you would be interested in supporting a dedicated S3 Store (building on the HTTP Store but with support for accessKeys and secretKeys? I could spin up a PR, but here is already an implementation using "aws4fetch".

import {
  HTTPStore,
  openGroup,
  openArray,
  ZarrArray,
  Group as ZarrGroup,
} from "zarr";
import { AwsClient } from "aws4fetch";

enum HTTPMethod {
  Get = "GET",
  Head = "HEAD",
  Put = "PUT",
}

export class S3HttpError extends Error {
  __zarr__: string;
  constructor(code: any) {
    super(code);
    this.__zarr__ = "HTTPError";
    Object.setPrototypeOf(this, HTTPError.prototype);
  }
}

export class S3KeyError extends Error {
  __zarr__: string;

  constructor(key: any) {
    super(`key ${key} not present`);
    this.__zarr__ = "KeyError";
    Object.setPrototypeOf(this, KeyError.prototype);
  }
}

export function joinUrlParts(...args: string[]) {
  return args
    .map((part, i) => {
      if (i === 0) {
        return part.trim().replace(/[\/]*$/g, "");
      } else {
        return part.trim().replace(/(^[\/]*|[\/]*$)/g, "");
      }
    })
    .filter((x) => x.length)
    .join("/");
}

class S3Store extends HTTPStore {
  aws: AwsClient;

  constructor(url: string, aws: AwsClient, options: any = {}) {
    super(url, options);
    this.aws = aws;
  }

  async getItem(item: any, opts: any) {
    const url = joinUrlParts(this.url, item);
    let value: any;
    try {
      value = await this.aws.fetch(url, { ...this.fetchOptions, ...opts });
    } catch (e) {
      throw new S3HTTPError("present");
    }
    if (value.status === 404) {
      // Item is not found
      throw new S3KeyError(item);
    } else if (value.status !== 200) {
      throw new S3HttpError(String(value.status));
    }

    return value.arrayBuffer(); // Browser
    // only decode if 200
  }
  async setItem(item: any, value: any) {
    const url = joinUrlParts(this.url, item);
    if (typeof value === "string") {
      value = new TextEncoder().encode(value).buffer;
    }
    const set = await this.aws.fetch(url, {
      ...this.fetchOptions,
      method: HTTPMethod.Put,
      body: value,
    });
    return set.status.toString()[0] === "2";
  }

  async containsItem(item: any) {
    const url = joinUrlParts(this.url, item);
    try {
      const value = await this.aws.fetch(url, {
        ...this.fetchOptions,
      });

      return value.status === 200;
    } catch (e) {
      return false;
    }
  }
}

Hope that might help some!

gzuidhof commented 2 years ago

Thank you for sharing :)

I would be hesitant to put it into the codebase as it would introduce a dependency on aws4fetch even for users that don't interact with s3(-compatible) stores.

But I would love to have this as an example in the docs! If you would be open to making a PR with that I'd happily merge that. I think it could deserve its own markdown file here, and then you would link in it in the sidebar here.

One improvement to your code: you could use a HEAD request for containsItem (docs), it saves downloading the entire chunk just to see if it exists :)

jhnnsrs commented 2 years ago

Agreed! Better no dependency hell.

Will write a few lines about it, and make a PR. Originally I had the HEAD request on containsItem, but HEAD is not always an allowed method on S3 endpoints (some minio deployments).

manzt commented 2 years ago

One thing we had also discussed previously would be to have the HTTPStore a more general fetch-based store where you can provide your own fetch implementation.

E.g.,

let store = new FetchStore(url, { fetch: globalThis.fetch }); // default, essentially current `HTTPStore`

let store = new FetchStore(url, {
  fetch: new AwsClient({ /* ... */ }).fetch,
  supportedMethods: ["GET"], // default ["GET", "PUT", "OPTIONS"],
});

Regardless, I think this example would be really great to have in our documentation!

jhnnsrs commented 2 years ago

Sounds great to me. Maybe I would still stick with the HTTPStore naming though. But I love the more compositional approach.