vitessce / vitessce-data

Utils for loading HuBMAP data formats
MIT License
6 stars 4 forks source link

generate linnarsson zarr from hdf5 directly #72

Closed manzt closed 4 years ago

manzt commented 4 years ago

We don't need to convert to OME to then convert back to zarr...

I added scripts to convert directly to zarr and then tile the chunked full resolution array into lower resolutions. Since we haven't determined our standard yet for pyramids (see https://github.com/hubmapconsortium/vitessce-image-viewer/issues/91), these scripts generate tiles in the format current supported for zarr in Viv:

.
└── linnarsson.images.zarr 
    └── pyramid 
        ├── 00 # generated by img_hdf5_reader.py
        │   ├── .zarray
        │   ├── 0.0.0
        │   └── ..etc
        ├── 01 # generated by tile_zarr_base.py
        │   ├── .zarray
        │   ├── 0.0.0
        │   └── ..etc
        └── 02 # generated by tile_zarr_base.py
            ├── .zarray
            ├── 0.0.0
            └── ..etc

The linnarsson.images.json should be very similar to the config in viv now, where keys here are "channels" and have the same url since it's one zarr array:

{
  "polyT": {
    "sample": 1,
    "tileSource": "https://s3.amazonaws.com/vitessce-data/0.0.20/master_release/linnarsson/linnarsson.images.zarr/pyramid/"
  },
  "nuclei": {
    "sample": 1,
    "tileSource": "https://s3.amazonaws.com/vitessce-data/0.0.20/master_release/linnarsson/linnarsson.images.zarr/pyramid/"
  }
}

The only thing missing in Viv logic is minZoom. I guess I could add that here to the linnarsson.images.json. In the future we shouldn't be exporting a linnarsson.images.json and instead should have this contained in the .zattrs of the base zarr. But this is what viv supports now and we want to update our demo. It would be good to iron out the zarr schema sooner rather than later..

manzt commented 4 years ago

We will need to change the s3 urls obviously, but all the converters are in place.

mccalluc commented 4 years ago

We don't need to convert to OME to then convert back to zarr...

What I remember from last summer was that we wanted to think of OME-TIFF as our central format: We think we'll have a lot of formats coming in, and we may want to generate a range of output formats as well, and we don't want m*n scripts.

This assumptions might no longer be true, but I'd like to be more clear about the roadmap. Perhaps loop in Nils?

Sorry not to raise this concern earlier.

manzt commented 4 years ago

I am working on making converters for many types of images, but specifically OME-TIFF with Heath in https://github.com/hubmapconsortium/img2zarr. I figured for the purposes of vitessce, and @ngehlenborg demo on wednesday, this should suffice because its the format the data are provided in.

ilan-gold commented 4 years ago

We should talk tomorrow, but the schema in this PR and the Vanderbilt one should match. I can show you what it should look like and you can also give me feedback since one is Zarr and one is not.

manzt commented 4 years ago

@mccalluc I updated s3_target.txt --> cloud_target.txt and all instances of $S3_TARGET --> $CLOUD_TARGET