Open constantinpape opened 3 years ago
Should we put the new data in a separate bucket? I can ask Josep to create one.
Yes, why not. Let's call it i2k-2020
Do we keep the same folder structure as for the other mobie projects?
From my point of view we don't need any folder structure because there will be only three files (see the very first post here: https://github.com/tischi/i2k-2020-s3-zarr-workshop/issues/1). But I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.
I would suggest not to add the full res raw data, but only the 100nm version.
Yes! Excellent suggestion!
Which data do we add apart from that.
As said above: in terms of files see the very first post here: https://github.com/tischi/i2k-2020-s3-zarr-workshop/issues/1
I am not sure about the table. I don't think @joshmoore has something yet ready to store the table in zarr format?!
And ❤️ for helping!
From my point of view we don't need any folder structure because there will be only three files (see the very first post here: #1). But I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.
Ok, in that case I would just add a single root zarr file with three multiscale datasets:
platy.zarr/
em-raw/
...
em-segmentation-cells/
...
prospr-myosin/
...
I am not sure about the table. I don't think @joshmoore has something yet ready to store the table in zarr format?!
We could just store it as a 2d dataset with column names in the header, but I think there is indeed not a NGF format for tables yet.
Anyway, I will start with the volumetric data and let you know once I have something. (I will probably just start with the myosin volume, so @joshmoore can check it out once I have put it on the bucket and after we make sure the format is correct we add the larger files).
Related to this: https://github.com/tischi/i2k-2020-s3-zarr-workshop/issues/3
If we want to use the MoBIE infrastructure the most straightforward would be if there would be somewhere an images.json
file (like this one) pointing to three bdv.xml files (like this one) with <ImageLoader format="bdv.n5.zarr.s3">
. If we would do this, we may "only" have to get this done (and some hopefully small add-ons in MoBIE) in order to have a working example to further iterate on.
pointing to three bdv.xml files (like this one) with
<ImageLoader format="bdv.n5.zarr.s3">
If we do this there are a few questions about the file layout, because we cannot simply use what I suggested here, because bdv assumes fixed paths inside the dataset (setup0/timepoint0
, ...).
I see three options:
bdv.n5.zarr.s3
format so that we allow specifying a custom pathInFile
to support a single root zarrBut I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.
I don't think so.
I don't think @joshmoore has something yet ready to store the table in zarr format?!
There is some work now on an initial format:
which briefly looks like this:
/opt/data/6001240.zarr $ cat labels/0/.zattrs
{
"image-label": {
"properties": [
{
"label-value": 1,
"class": "foo"
},
{
"label-value": 2,
"class": "bar"
}
],
"colors": [
{
"label-value": 1,
"rgba": [
128,
128,
128,
128
]
},
Ok, in that case I would just add a single root zarr file with three multiscale datasets:
Also ok.
But I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.
I don't think so.
Ok, let's discuss the layout tomorrow in the meeting.
There is some work now on an initial format:
* [ome/omero-cli-zarr#50](https://github.com/ome/omero-cli-zarr/pull/50) * [ome/ome-zarr-py#61](https://github.com/ome/ome-zarr-py/pull/61) * [ome/ome-zarr-py#63](https://github.com/ome/ome-zarr-py/pull/63)
This will produce large jsons in our case :). But we can give it a try; and in the future we can hopefully switch to storing the table as a zarr array.
But we can give it a try; and in the future we can hopefully switch to storing the table as a zarr array.
For the testing, you could just write one feature value, like size
.
Personally, if I would like to get something working within one week until i2k, I would do the following:
images.json
a.xml
b.xml
c.xml
a.zarr
b.zarr
c.zarr
This will produce large jsons in our case :)
Yup. Definitely aware. I had tried the zarr array solution but ran into https://github.com/saalfeldlab/n5/pull/73#issuecomment-688731487 Also discussed possible integrate with Parquet etc last night on the community call. Open to thoughts.
@tischi your plan sounds good. I can def. set up 1. :). Will try to do as much as possible there before the meeting tomorrow and then we can finalize the plan before i2k.
@joshmoore I uploaded one multiscale dataset to our new bucket.
Could you please check that you can access it? Here's the details:
ServiceEndpoint: https://s3.embl.de
BucketName: i2k-2020
PathInBucket: platy.ome.zarr (this is the zarr root)
If you can access it, can you check if the dataset at prospr-myosin
is compatible with the zarr multiscale format?
Thanks!
Hi @constantinpape,
The .zattrs that's in ...ome.zarr/
will need to be in the prospr-myosin/
directory
aws --no-sign-request --endpoint-url=https://s3.embl.de s3 ls --recursive s3://i2k-2020/platy.ome.zarr/ | grep /.z
2020-11-19 14:49:21 400 platy.ome.zarr/.zattrs
2020-11-19 14:49:21 24 platy.ome.zarr/.zgroup
2020-11-19 14:49:21 327 platy.ome.zarr/prospr-myosin/s0/.zarray
2020-11-19 14:49:21 327 platy.ome.zarr/prospr-myosin/s1/.zarray
2020-11-19 14:49:21 327 platy.ome.zarr/prospr-myosin/s2/.zarray
2020-11-19 14:49:21 321 platy.ome.zarr/prospr-myosin/s3/.zarray
The .zattrs that's in
...ome.zarr/
will need to be in theprospr-myosin/
directory
Thanks for checking! I fixed it in the code.
I added the data according to what we discussed, see #4
@tischi, I wrote a couple of mails with @joshmoore today and as far as I understand the current plan is the following: We don't ship the data to josh and instead convert and upload it locally.
I have a converter script and I am pretty sure it does the right thing, but I have a couple of other questions:
P.S I made a new issue because #1 got a bit crowded.