visgl / deck.gl

WebGL2 powered visualization framework
https://deck.gl
MIT License
12.28k stars 2.09k forks source link

[Feat]Vector tiles in Cartesian coordinates #9255

Open s-n-i opened 3 days ago

s-n-i commented 3 days ago

Target Use Case

Deck.gl has MVTLayer, which allows displaying vector tiles in geographic (latitude and longitude) coordinates. Microscopy and possibly other applications need to be able to display vector tiles in Cartesian coordinates.

Proposal

I have written a an implementation which instantiates TileLayer, which parses tiles using MVTLoader.parse and displays them using GeoJsonLayer. These tiles use Cartesian coordinates to specify polygon vertex positions. I was wondering if it would be useful to have open a PR with this functionality. I think this could evolve into either:

  1. A coordinateSystem property on MVTLayer, which can be either cartesian or geospatial
  2. A new layer, which can be called CartesianTileLayer

When researching this topic, I think I saw a post from @cornhundred related to this issue a while ago.

ibgreen commented 3 days ago

Better support for non-geospatial use cases is of definitely of interest.

My first reaction is that MVTLayer is focused on supporting data stored in mapbox vector tiles (MVT) format - which I currently (probably mostly by habit) tend to think of as a very geospatial format.

Some questions:

ilan-gold commented 2 days ago

Are mapbox vector tiles already being used to store microscopy results, or is this more posited as an ambition/exploration?

No this currently is not the standard. zarr-based storage is normal and while there are emerging standards like from OME-NGFF for pixel data, this standard has nothing for vector graphics. SpatialData has something for points/polygons but it is not multiscale, which is the MVT selling-point.

What kind of microscopy results are suitable for vector tile storage? The output of some kind of feature detection process on the raster images?

Yes, hand-drawn results as well, but also spot-based detection i.e., pure points detecting all the little molecules in something like this: image

Are there tools to generate such data that are accessible to the community?

There is no community-based multiscale vector graphics generator to my knowledge. I would probably guess this fact is driven by two things:

  1. AnnData and SpatialData (outside of the underlying OME-NGFF pixel-based) do not support multi-scale data in general, and so have not really thought about storage.
  2. No on-disk cartesian standard exists that could be used (that we are aware of)

Is there any writeup on conventions around storing such data in MVT? How to interpret the coordinate space, how to generate Z/X/Y indexes, what is the top-level bounding box etc.

Not to my knowledge but maybe @cornhundred can chime in. I would say that it would be ideal if we didn't have to shoehorn geospatial into microscopy. I think that was the advantage of Viv - we didn't do any special casing, and instead just did the thing.

The answers may be almost self-evident, but some kind of definition would be useful.

Lots of holes and little knowledge silos in the community, so tried to give a summary as I see it.

s-n-i commented 1 day ago

Are mapbox vector tiles already being used to store microscopy results, or is this more posited as an ambition/exploration?

There are microscopy datasets with polygons in GeoJSON format. I have previously generated a dataset in MVT format by converting one of them using tippecanoe (https://github.com/mapbox/tippecanoe). However, the results were not directly suitable for overlaying on the underlying raster image because the raster image was in Cartesian coordinates and both MVTLayer and tippecanoe use latitude and longitude. We are working on a GeoJSON derivative, which we refer to as MicroJSON (https://polusai.github.io/microjson/), to store microscopy results.

What kind of microscopy results are suitable for vector tile storage? The output of some kind of feature detection process on the raster images?

Yes, some of our results are polygons that are the output of feature detection. We also have text labels. We could support hand drawn polygons as well. The general use case is metadata annotation of ROIs in large-scale imaging data sets.

Are there tools to generate such data that are accessible to the community?

The MicroJSON project is open source: https://github.com/PolusAI/microjson/, and we are currently working on such tools and will release them soon in this repo.

Is there any writeup on conventions around storing such data in MVT? How to interpret the coordinate space, how to generate Z/X/Y indexes, what is the top-level bounding box etc.

Yes, we use TileJSON similar to what many geographic applications do: https://polusai.github.io/microjson/tiling/

Although there is quite a bit of background here, I am only proposing a limited set of changes. I am only proposing adding MVTLayer functionality in Cartesian (rather than geographic) coordinates to deck.gl.

ilan-gold commented 1 day ago

Wow, this looks great. @s-n-i I am not sure you're aware of our consortium, but I am from scverse (which includes the developers of SpatialData). It'd be great if you guys could connect.

https://scverse.zulipchat.com/#narrow/channel/315824-spatial/topic/MicroJSON.20.2B.20TileJSON/near/483652297

We have a zulip channel where we could discuss further.

Otherwise, perhaps we could connect over email?

ibgreen commented 1 day ago

This looks like a very exciting development track, and yes it would make a lot of sense for vis.gl (deck / loaders etc) to support this.

And thanks for the wealth of background information, more than I expected/needed. I was mostly trying to get a sense for whether there are actual tiles to display...

Because the best way to add something like this is usually to provide data (in this case a set of non-geospatial vector tiles) that we can test against / build examples against etc. Just hacking in a non-geospatial code path in the TileLayer to unblock you is of course possible but without solid use case and tests such things tends to become brittle and break over time.

While I really appreciate your intent to reduce this to a well-defined and limited feature request, I believe there is a long and potentially fruitful tail of ways biology visualizations can leverage deck.gl etc, some examples:

Eager to see more collaboration here, happy to continue on OpenJS slack or even via a short call etc.

ibgreen commented 1 day ago

Thinking more about this, I am not sure the name MVTLayer is particularly helpful. My take is that MVTLayer doesn't have many hard dependencies on mapbox vector tiles but was rather created a handle a couple of features on top of TileLayer, such as:

As I have been generalizing Tile loading using the new DataSource system in loaders.gl (to support PMTiles, client-side tiling etc) I have actually ended up with a new TileSourceLayer that sometimes renders a TileLayer and sometimes an MVTLayer, also having to deal with the selection of coordinates:

Maybe a better name can be found for the type of layer we are looking for (FeatureTileLayer, TabularTileLayer, ...)

There is also the long-term to ambition to potentially remove tile layers completely from deck.gl.

Basically we would have strong Tileset classes that load tiles and presents a table composed of the content of all the tiles that can then be rendered by any normal layer. Using Arrow and RecordBatches this combined table can be recomposed after every tile load/unload with zero copy.

ilan-gold commented 1 day ago

@ibgreen I was considering saying something realted to this as well - I think you had it with your first comment.

First step should probably be checking GeoJSONLayer and beginning to think about how we're going to test i.e., where are we going to get data from that everyone can re-use. With the TileLayer we pulled from NASA, but not sure that's feasible here.

Then we can coordinate tiling (which can be hosted in a separate package anyway probably since it will use GeoJSONLayer sublayers if I understand)

ibgreen commented 1 day ago

where are we going to get data from that everyone can re-use

since it will use GeoJSONLayer sublayers if I understand)

Since we are brainstorming... we may even want to rethink the name GeoJSONLayer (gasp...). We may want something more primitive, perhaps called a FeatureLayer - the GeoJSONLayer could be a sub layer that happens to understand GeoJSON specifically and have a geospatial focus.

cornhundred commented 1 day ago

Hi all, thanks for pinging me into this great conversation. Our group at the Spatial Technology Platform at the Broad Institute has been developing an approach using Parquet/GeoParquet files for creating vector tiles of biological data (e.g., transcript, cell segmentation) that we are very close to releasing publicly (alpha release). In short, we're building off the work from @kylebarron (GeoArrow and GeoParquet in deck.gl) to load vectorized data from Parquet files in JavaScript and pass this to deck.gl. So far the results have been very promising and we can let you all know once we have made the repo public.

I had the following concerns with using MVT tiles for biological data: limits to the number of objects in each tile, lack of support for non-geospatial data https://github.com/mapbox/tippecanoe/issues/907, difficulty making the tiles (e.g., relying on tippecanoe), and we do not really need multi-scale representations of biological data for our use case (e.g., multi scale representations of cell segmentation polygons). However, it would be great to see if the format can be extended to better support the needs of biological data.

Nicholas-Schaub commented 1 day ago

I lead the group working on development of the MicroJSON spec. Bengt Ljungquist is the core developer and maintainer. Nikita (OP) is working on integrating the format into our data visualization tools. We are part of the National Center for Advancing Translational Science (NCATS) at NIH. Awhile ago we had requested feedback from the image.sc community, but based on the conversation here I think there's a lot of additional feedback that would be helpful.

We have also looked into GeoParquet and GeoArrow, but have not begun trying to integrate it. The idea of MicroJSON is that (despite its name) it is a vector format to support microscopy data, especially some of the complex data found in medicine in biology. We are not interested in replicating work, and we only developed it because there was a gap in the space.

It looks like there are some good integration points, and some useful things we could learn from you.

Could we set up a meet and greet/coordination meeting? Who would be interested in attending?

ilan-gold commented 1 day ago

@Nicholas-Schaub @cornhundred I shared a zulip channel. It would be really great if we can get the SpatialData developers in this conversation: https://scverse.zulipchat.com/#narrow/channel/315824-spatial/topic/MicroJSON.20.2B.20TileJSON

Otherwise, I am happy to set something up by email: ilan.gold@helmholtz-munich.de

If we could have people saving this by default from python alongside their imaging data, it would really expand usability and shareability!

It would be great to understand the MicroJSON spec - I have a suspicion it can be used in zarr but it would be great to have a call.

cornhundred commented 1 day ago

@ilan-gold sure happy to discuss with the group

Nicholas-Schaub commented 23 hours ago

@ilan-gold Yes! Part of our vision with MicroJSON is to make it as compatible as possible with existing standards like OME NGFF. One of the first people we reached out to for feedback was Josh Moore. Out of the box, there is an obvious way to embed MicroJSON into OME NGFF metadata. It is not scalable but probably useable for many situations.

Yes, you did share Zulip. I overlooked it. We will see you there. We would still like to do a meet and greet meeting just because I think something is lost in text, but we can discuss there.