visgl / deck.gl

WebGL2 powered visualization framework
https://deck.gl
MIT License
12.18k stars 2.09k forks source link

[Feat] Support Reading MVTiles from S3 buckets #8590

Closed james-willis closed 7 months ago

james-willis commented 7 months ago

Target Use Case

Currently only publically accessible objects stored in S3 can be retrieved as tiles in the MVTile layer. This feature would enable users to read non-public tiles out of S3. I already have an implementation locally.

Proposal

An example tile layer definition might look like:

  const tileLayer = new MVTLayer({
    data: 's3://myBucket/my/prefix/{z}/{x}/{y}.mvt',
    s3ClientConfig: {
      region: 'us-west-2',
      credentials: {
        accessKeyId: myToken,
        secretAccessKey: MySecretKey,
        sessionToken: myToken
      }
    }
  });

The credentials would be used to generate a presigned URL for each tile that is queried using the fetch function as currently implemented.

Pessimistress commented 7 months ago

Is this not covered by https://deck.gl/docs/developer-guide/loading-data#example-fetch-data-with-credentials ?

james-willis commented 7 months ago

Is this not covered by https://deck.gl/docs/developer-guide/loading-data#example-fetch-data-with-credentials ?

I don't think so. Each tile needs a distinct auth header to create the presigned URL. I'm not aware of any auth mechanism exposed by S3 that allows static headers across requesting different tiles.

jamesscottbrown commented 7 months ago

I don't think so. Each tile needs a distinct auth header to create the presigned URL. I'm not aware of any auth mechanism exposed by S3 that allows static headers across requesting different tiles.

I think you could write a Lambda function that receives receives HTTP requests, performs its own checks based on the Authorization header, and either returns the appropriate tile or a 401/403 error.

(Alternatively, you could separate this out, into a lambda function that serves tiles and a separate Lambda authorizer, with API Gateway configured to use the first lambda function as the Integration and the second as the Authorizer. I've previously done a POC of this, with the first lambda services extracting MVT tiles from a PMTiles archive ).

james-willis commented 7 months ago

I think you could write a Lambda function...

I agree that this would work but I feel that it adds unnecessary complexity to the story of vending tiles containing private data. Consequently some users will be effectively excluded from using deckgl for this kind of usecase.

I'm imagining the users for this feature to be folks like data scientists who are comfortable writing code but maybe have less expertise working with cloud provider infrastructure. Here is a user story:

  1. A data scientist generates inferences of new roads from confidential customer data.
  2. That scientists uses the existing road dataset along, the new inferences, and the source gps traces to generate map tiles to share with their colleagues the results of this iteration of their model. They put these tiles in a private bucket used by their team.
  3. A coworker visualizes the tiles as a layer in pydeck.

In my experience a scientist might use something like geojson + qgis, but that is slow even with moderate data size and there is a lot of work on the reader to set up their qgis config with the distinct layers (new roads, old roads, source data in the example).

Pessimistress commented 7 months ago

I believe you can also pass a callback function to loadOptions.fetch. It won't work for pydeck, though.

james-willis commented 7 months ago

Here is a draft PR I have in a fork: https://github.com/wherobots/deck.gl/pull/1

jamesscottbrown commented 7 months ago

This PR adds two dependencies, one of which is quite large - @aws-sdk/client-s3 has an unpacked size of 2.99MB. Given the signature calculation is documented, it might be better to implement it directly.

Implementations of both SHA-256 and HMAC-SHA256 primitives are provided in the hash/sign parts of the SubtleCrypto interface of the Web Crypto API, but in some (?) browsers this is only available for sites being served ovr HTTPS.

(For example, here are a couple of similar reimplementation of the signature calculation, though they import SHA-256/HMAC-SHA-256 functions from other libraries; Amazon's implementation is here).

Pessimistress commented 7 months ago

I am not a fan of adding this functionality to MVTLayer itself, because:

This layer uses MVTLoader to load tiles by default. We already have a mechanism to replace the default loader. I have not tried this but theoretically you should be able to do:

import {MVTWorkerLoader} from '@loaders.gl/mvt';

const S3MVTWorkerLoader = {
  ...MVTWorkerLoader,
  options: {
    ...MVTWorkerLoader.options,
    fetch: (url: string, init: RequestInit) => ...
  }
};

new MVTLayer({
  ...
  loaders: [S3MVTWorkerLoader]
})

Again I don't think there is an easy way to plug this into pydeck/declarative, @ibgreen may have some insight?

james-willis commented 7 months ago

Thanks for all the input. I've opened a feature request in loader.gl.