ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
119 stars 41 forks source link

Mesh specification #33

Open glyg opened 3 years ago

glyg commented 3 years ago

As discussed in the feb. 2021 ngff community call, and following this image.sc thread

The idea is to follow PLY specification to store meshes in ome-zarr. A ply file is organised in:

There is a draft implementation here: https://github.com/centuri-engineering/ply-zarr

Some questions:

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/mesh-data-in-ome-zarr/44653/15

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-feb-23/48386/9

joshmoore commented 3 years ago

cc: @jfkotw @kephale who were also interested during the meeting. I defer on whether or not this issue covers all of "vector-based".

glyg commented 3 years ago

I'd argue for GEOjson for ROIs and points and such & keep meshes in their niche

joshmoore commented 3 years ago

@glyg, so this block from ply-zarr is the critical bit for discussion?

ply_header = {
    "format": "ascii 1.0",
    "comments": [f"created by ply_zarr v0.0.1, {datetime.now().isoformat()}",],
    "elements": {
        "vertex": {
            "size": 47,
            "properties": [
                ("double", "x"),
                ("double", "y"),
                ("double", "z")
            ]
        },
        "face": {
            "size": 105,
            "properties": [
                ("list", "uint8", "int32", "vertex_indices"),
            ]
        }
    }
}
glyg commented 3 years ago

Yes, this mirrors the specification for the PLY header, then it seems natural to store the faces in separate arrays according to their number of sides.

glyg commented 3 years ago

see a more concrete example of mixing meshes, images and labels here

I assume the xarray compatibility also applies here, I'll look into that next.

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ngff-status-update-may-2021/52918/1

jburel commented 2 years ago

cc @normanrz

normanrz commented 2 years ago

We recently implemented the mesh format from Neuroglancer in webKnossos: https://github.com/google/neuroglancer/blob/master/src/neuroglancer/datasource/precomputed/meshes.md

It's been great for our purposes:

I think that format would be a great candidate to be adopted by OME-NGFF.

glyg commented 2 years ago

@normanrz thanks for the input, those features indeed sound great (esp. multi-res!).

If I understand correctly though, only triangular meshes are supported? The other consumer / producer of meshes is the modeling community (i.e. physical biology), who would need more generic meshes, for example with polygonal (>3) 2D cells, polyhedral 3D cells, or even quadratic tetrahedra.

Would draco be able to handle that kind of data? How would a zarr implementation work? Is it enough to "just" put the draco encoded in the store and add a dependency to be able to read / write them?

Also, maybe storing generic FEM meshes is out of scope for ome-ngff and triangles are enough.

normanrz commented 2 years ago

If I understand correctly though, only triangular meshes are supported? The other consumer / producer of meshes is the modeling community (i.e. physical biology), who would need more generic meshes, for example with polygonal (>3) 2D cells, polyhedral 3D cells, or even quadratic tetrahedra.

Yes, I think draco only supports triangular meshes (and point clouds). We could look into allowing other encodings in addition to draco.

How would a zarr implementation work? Is it enough to "just" put the draco encoded in the store and add a dependency to be able to read / write them?

That is a good question that we haven't fully figured out yet. We currently store all the data in a single binary file. The file consists of a) a directory structure (hash map) to locate the meshfile for a specified id within b) a long blob of mesh data. In b) each meshfile has a binary metadata header that describes the available chunks and level-of-details. One implementation on top of zarr would be to store each meshfile as on chunk (e.g. in a 2D uint8 array). This would create a lot of chunk files and might create some issues, because the chunks will have different byte lengths.

EricMoerthVis commented 3 months ago

I would like to get involved in the discussion.

I think it would be great to have a format similar to the Neuroglancer format in OME. Now the 3D data generation is getting more and more popular in the Spatial Biology field and segmentations are a big part of it. Having the possibility to, beside storing the volumetric (point cloud) data in OME Zarr it would be greally great to have the same possibility to do that for meshes.

I am wondering if there would be the possibility to exchange about the format and specifications in a meeting or such?

joshmoore commented 3 months ago

I would like to get involved in the discussion.

Consider yourself involved! 🙂

I think it would be great to have a format similar to the Neuroglancer format in OME.

Modulo https://xkcd.com/927/ of course. This is certainly something that I've heard several times recently as well, but it will certainly take one or more champions for it to happen. Also cc @jbms for how he weighs the changes as well as the pros & cons.

I am wondering if there would be the possibility to exchange about the format and specifications in a meeting or such?

Most of the recent meetings have been around the challenge which is pushing forward Zarr v3 support (i.e., RFC-2). It's certainly time for a next general community meeting, or alternatively, a smaller group could start socializing the idea in preparation for a RFC.

kephale commented 3 months ago

I'm still here watching this thread, and would be happy to help get a small group discussing what the best options are for this!

EricMoerthVis commented 3 months ago

I see!

I think there are similarities and differences between storing volumetric (point cloud data) and the meshes. One main similarity as introduced by the standard Neuroglancer uses is:

Multi-Resolution support for meshes! This is really crucial for the vast amount of meshes we are gonna store and load again

I think the main difference is that meshes don't adhere to such a nice Grid Structure as the point Clouds. So I am wondering how we can store them in their multi resolutions but still know where they are located in XYZ so we can efficiently load them when needed.

So there might be more MetaData to know the Bounding Box, Centroid or other measures to know if a Mesh is visible in a certain location so we can define if it should be loaded by the client or not.

Would really like to see a first Mesh Support (maybe based on the NeuroGlancer format supporting Draco) soon in Zarr

d-v-b commented 3 months ago

Would really like to see a first Mesh Support (maybe based on the NeuroGlancer format supporting Draco) soon in Zarr

What would meshes look like in the Zarr data model? Zarr v3 doesn't have support yet for variable length types, so at a minimum we would need to add that, and even then I'm not sure how meshes, expressed as variable-length collections of geometrical objects, would be stored in an N-dimensional array. What would the array indices mean? I suspect people would fall back to 1D arrays, with maybe a second array for storing a spatial index? It could work, but it's not a great fit for Zarr IMO.

On the other hand, the neuroglancer multiresolution mesh format seems perfectly fine on its own, outside of Zarr. So maybe just refining or generalizing that format as needed would be simpler than forcing it into Zarr.

normanrz commented 3 months ago

I agree that the mesh format doesn't need to live in Zarr arrays. We could (mis)use uint8 arrays, to store the bytes, but I don't know what value that would bring in comparison to just storing the blob alongside the Zarr arrays in the hierarchy. In general, I don't think that all pieces of OME-Zarr need to be Zarr.

EricMoerthVis commented 2 months ago

So the idea would be to adopt the NeuroGlancer Format (https://github.com/google/neuroglancer/blob/master/src/datasource/precomputed/meshes.md#multi-resolution-mesh-format) and integrate it into OME-Zarr?

normanrz commented 2 months ago

So the idea would be to adopt the NeuroGlancer Format (https://github.com/google/neuroglancer/blob/master/src/datasource/precomputed/meshes.md#multi-resolution-mesh-format) and integrate it into OME-Zarr?

I think that would be a good way forward. There are a few details in terms of metadata and file layout that need to be figured out. Would be great to hear @jbms feedback on this.

joshmoore commented 2 months ago

A quick heads up that I heard from Jeremy today on a separate matter: he's been on leave. I very much assume when he's caught back up he'll chime in.

EricMoerthVis commented 1 month ago

I just want to get this discussion running again. What would be potential next steps?

normanrz commented 1 month ago

I think a meeting to sketch out an RFC would be a good next step. There should be an accompanying post on image.sc to announce that meeting.

d-v-b commented 1 month ago

@normanrz I'm not sure how crystallized the schedule is for the upcoming OME-NGFF workflows hackathon, but maybe carving out some (EST-timezone-friendly) slots would be convenient?

EricMoerthVis commented 1 month ago

That sounds like a good plan to discuss in that timeframe!

jbms commented 1 month ago

Sorry, was on paternity leave until today.

As others have also stated, while meshes can be potentially thought of as collections of arrays of vertex properties and face properties, I think trying to represent them as zarr arrays directly would add a lot of complexity and not provide significant advantages, given how meshes are actually used in practice.

There is certainly a lot of room for improvement in the Neuroglancer precomputed multiscale mesh format (and the related annotation format) but I think if the existing format serves a decent number of use cases then it may be wise to standardize it as-is initially, and then once there is greater usage experience work on a revised format.

EricMoerthVis commented 1 month ago

No worries!

Yes! I think this sounds like a really good plan! I think there is also a great need for more standardized creation and retrieving pipelines for the format. So I like your suggestion of first taking it up as-is and gradually improving it over time.