Add support for different file formats to spec

constantinpape commented 3 years ago

Given that the ome.zarr format is progressing (https://github.com/ome/ngff/pull/46) and that we may eventually want to support OpenOrganelle data (cc @bogovicj), I think we should add support for data formats that don't have a bdv.xml representation to the spec.

Currently, we store the data location in the spec like this:

"imageDataLocations": {
  "fileSystem": "bdv-file-for-local-data.xml",
  "s3store": "bdv-file-for-s3-data.xml"}

Here, we assume that the values are (relative) file paths that point to a bdv.xml file that in turn contains either the path to a local n5/hdf5 file or the address for the s3 storage.

However, for ome.zarr we don't need an xml file, because all relevant metadata is stored in the zarr attributes. In addition, for data that is stored on s3 (or some other object store), we also don't need to have any corresponding file in the project, but instead need to store the address.

So I would propose to add a new field fileFormat to support this. Initially we would support bdv.xml and we can probably add ome.zarr very soon.

For example:

"fileFormat": "bdv.xml",
"imageDataLocations": {
  "fileSystem": "bdv-file-for-local-data.xml",
  "s3store": "bdv-file-for-s3-data.xml"}

"fileFormat": "ome.zarr",
"imageDataLocations": {
  "fileSystem": "some-relative-path.ome.zarr",
  "s3store": "https://s3.embl.de/some-bucket/some-name.ome.zarr"}

or (eventually) for OpenOrganelle:

"fileFormat": "openOrganelle",
"imageDataLocations": {
  "s3Store": "s3://janelia-cosem/jrc_macrophage-2/jrc_macrophage-2.n5"}

tischi commented 3 years ago

Sounds good!

To be more flexible what about something like this?

"imageDataStorage":
  [ "store1": { "format": "bdv xml/h5", "storage": "fileSystem", "relativePath": "some-relative-path.xml" },
    "store2": { "format": "ome.zarr", "storage": "s3Store, "relativePath": "some-relative-path.ome.zarr" } ],

constantinpape commented 3 years ago

To be more flexible what about something like this?

"imageDataStorage":
  [ "store1": { "format": "bdv xml/h5", "storage": "fileSystem", "relativePath": "some-relative-path.xml" },
    "store2": { "format": "ome.zarr", "storage": "s3Store, "relativePath": "some-relative-path.ome.zarr" } ],

Makes sense, I will update the spec later and create an example project with it.

constantinpape commented 3 years ago

@tischi I had another look at your proposal and it's not valid json. I would suggest the following now:

"imageData": {
"<NAME>": {"format": "bdv.xml", "storage": "fileSystem", "relativePath": "..xml"},
"<NAME>": {"format": "ome.zarr", "storage": "s3", "https://s3...."}

i.e. imageData is a map with different names that map to the storage description for that name.

The question is a bit how we deal with these names then. I think we should not restrict them, but then we somehow need to list the available storage options in MoBIE. So do we need another field in "project.json" for this, so that you know this when initialising the viewer without going through all sources?

tischi commented 3 years ago

Good question...

In the code it looks like this:

final String relativeImagePath = source.imageDataLocations.get( settings.values.getImageDataStorageModality().toString() );

Which, I think we discussed before, could be make more general:

final String relativeImagePath = source.imageData.get( settings.values.getImageDataStore() ).relativePath;

Where source.imageData is the Map that you propose and settings.values.getImageDataStore() is the key to be used for opening the specific instance (getImageDataStorageModality() used to be an enum with only two options: fileSystem and s3).

The question now is where does the user specify which of the image data stores (keys) they want to use.

I don't know if we have to add this to the project.json, because one can figure out all the options also when parsing the specific dataset.json.

Frankly, I don't know exactly, maybe we need to discuss in person.

constantinpape commented 3 years ago

The question now is where does the user specify which of the image data stores (keys) they want to use.

I don't know if we have to add this to the project.json, because one can figure out all the options also when parsing the specific dataset.json.

Frankly, I don't know exactly, maybe we need to discuss in person

I proposed to put it in the project.json, because it is the first entry point for the MoBIE viewer. But if you would rather have it in the dataset.json that's also fine with me.

But I see that this changes the logic of opening MoBIE projects a bit, because now we don't know a priori which data sources are available, but need to parse some meta-data for it, so the user cannot select this anymore without loading some metadata.

Maybe we can find 5 minutes in the meeting later to discuss this.

tischi commented 3 years ago

I proposed to put it in the project.json, because it is the first entry point for the MoBIE viewer. But if you would rather have it in the dataset.json that's also fine with me.

It's good to have it in the projection.json, but then we have to maintain the same information in two places, which is not ideal...

constantinpape commented 3 years ago

Alternatively, we could also keep the current logic of using "storage" as key and only allowing a fixed set of keys (currently "fileSystem" and "s3Store"). Then the only change would be that this maps to a map with the format and the source (= either relative path or address, we can still discuss how to actually encode this):

"imageData":{
  "fileSystem": {"format": "bdv.xml", "source": "some/file.xml"},
  "s3Store": {"format": "ome.zarr", "source": "https://s3.something.com"}
}

Then we don't need to duplicate the information and know beforehand what options are available (for now "fileSystem" or "s3Store"). The downside is that we don't allow for providing the data via different s3 stores, which would be possible in the other approach.

tischi commented 3 years ago

I still find the whole logic a bit suboptimal:

    "056F63395C_lynEGFP_seg": {

              "segmentation": {

                "imageDataLocations": {

                  "fileSystem": "images/local/membrane-056F63395C_lynEGFP_seg.xml",
                  "s3store": "images/remote/membrane-056F63395C_lynEGFP_seg.xml"
                },
                "tableDataLocation": "tables/membrane-056F63395C_lynEGFP_seg"

              }

Maybe we should also unify what we do for images and tables in terms of specifying the storage location and filetype.

constantinpape commented 3 years ago

New idea:

"imageData": {
  "fileSystem": {"format": "bdv.xml", "source": "..."},
  "s3Store": {"..."}
},
"tableData": {
  "fileSystem": {"...."},
  "s3Store": {"..."},
  "github": {"..."}
}

constantinpape commented 3 years ago

Ok, to summarise the changes discussed with @tischi:

"imageData": {
  "fileSystem": {"format": "bdv.n5", "source": "images/my_image_fs.xml"},
  "gitHub": {"format": "bdv.n5.s3","source": "images/my_image_gh.xml"},
  "s3Store": {"format": "bdv.n5.s3", "source": "https://s3.com/my_image.xml"}
},
"tableData": {
  "fileSystem": {"format": "tsv", "source": "tables/some_tables",
  "gitHub": {"format": "tsv", "source": "tables/some_tables"},
  "s3Store": {"format": "tsv", "sorce": "https://s3com/tables/some_tables"}
}

This means: for "fileSystem" and "gitHub", the "source" is relative to the project directory / repository (which is derived from the entrypoint that the user provides, i.e. either the github repo or the path on the filesystem). For "s3Store", the "source" points to the s3 address. Note that the key only specifies where the 'image entry point' is located, which is different from the actual image data in the case of bdv.xml (it will be the same for ome.zarr though). For example: {"gitHub":{ "format": "bdv.n5.s3", "source": "images/my_image_gh.xml"}} means that the xml is read from github and the xml contains the information to load the actual image data from s3. For now, we support the formats "bdv.hdf5", "bdv.n5", "bdv.n5.s3". (Support for open organelle and ome zarr formats should come soon.)

The case for tables is very similar and for now we support the format "tsv". In the future, we may support other formats like databases, see #23. For the current format, "source" always provides the location of the top directory that contains all the table files.

@tischi I will create a branch with an example for this and let you know once it's there.

Edit: changed "github" to "gitHub".

tischi commented 3 years ago

I think it should be gitHub.

constantinpape commented 3 years ago

@tischi I added an example for the updated spec here: https://github.com/mobie/arabidopsis-root-lm-datasets/tree/new-data-spec

tischi commented 3 years ago

Thanks! Copying here a part of it.

        "imageData": {
          "fileSystem": {
            "format": "bdv.n5",
            "source": "images/local/lm-cells.xml"
          },
          "gitHub": {
            "format": "bdv.n5.s3",
            "source": "images/remote/lm-cells.xml"
          }
        },
        "tableData": {
          "fileSystem": {
            "format": "tsv",
            "source": "tables/lm-cells"
          },
          "github": {
            "format": "tsv",
            "source": "tables/lm-cells"
          }
        }
      }

Somehow I am still confused as to whether we are getting it right...

Also for s3 one could in fact imagine to copy the whole project to a different bucket / institute, and then it would be nice if one would not have to change the JSONs. Thus I wonder whether also the S3 could be relative?
Given above point, everything could be relative and then, currently for the tables, the entries all would identical, blowing up the dataset.json maybe unnecessarily.
n5 and zarr in principle don't care whether they are on a file system or in an object store, that's why maybe we do not need to specify this in the dataset.json in terms of a different "format". The reader could figure out automatically whether it needs to read from S3.
It is confusing now that one can "read an image from gitHub"

Maybe we could rethink this again, forgetting for now that we currently need two different xml, assuming that images are stored as ZARR. And then, maybe just in the java code do something special to navigate to the correct xml, but not explicitly specify both locations in the dataset.json?!

tischi commented 3 years ago

And then, maybe just in the java code do something special to navigate to the correct xml, but not explicitly specify both locations in the dataset.json?!

We could have just two xml next to each other, one ending on .s3.xml. Or we keep the subfolders as we have them, but do not mention them in the dataset.json, but add the subfolder in the Java code, as needed. It would be nice to rename the subfolders to "fileSystem" and "s3" (but I see that this is too much work for the data in object stores).

constantinpape commented 3 years ago

1. Also for s3 one could in fact imagine to copy the whole project to a different bucket / institute, and then it would be nice if one would not have to change the JSONs. Thus I wonder whether also the S3 could be relative?

This will not work for the following case: read project data from github and access the image data on s3: In that case we need absolute paths, because the image data locations are not relative to the project root.

3\. n5 and zarr in principle don't care whether they are on a file system or in an object store, that's why maybe we do not need to specify this in the dataset.json in terms of a different "format". The reader could figure out automatically whether it needs to read from S3.

I don't really mind either way, but I like right now that the file format names are the same as used in the bdv xml file, where we have both bdv.n5 and bdv.n5.s3. For the ome.zarr use case we don't need to introduce it, becuase having the image data on github there also doesn't make sense.

4\. It is confusing now that one can "read an image from gitHub"

But this is the most consistent solution. It specifies where the image data entry point is located. I don't think it matters so much that it's a bit confusing on the surface. For a normal user this is never visible and for advanced users we can properly document this.

We could have just two xml next to each other, one ending on .s3.xml. Or we keep the subfolders as we have them, but do not mention them in the dataset.json, but add the subfolder in the Java code, as needed. It would be nice to rename the subfolders to "fileSystem" and "s3" (but I see that this is too much work for the data in object stores).

I don't like introducing implicit assumption on the folder structure in the spec again.

tischi commented 3 years ago

This will not work for the following case: read project data from github and access the image data on s3.

That's true, that could mean that when reading from github one needs to provide a root path to the image data, typically pointing to S3. I think that would make intuitive sense, because when reading from github, the image data cannot be hosted there and that's why it feels natural that one needs to specify the location. We could put available object store locations into the projection.json.

I don't like introducing implicit assumption on the folder structure [about the xmls] in the spec again.

I think that could be OK, because it is a temporary (legacy) thing just because we need different xml right now. I think this will go away once we move to ZARR.

It also may not be ideal that in each and every xml we need to specify the S3 location. Specifying this once at "project startup" would be more lean and also allow us to copy the whole project onto a different S3 store without having to modify all xml. Should I have a look into this from the Java side whether I could make this work?

Overall I am trying to see whether we could find a solution such that really the same folder structure works on the fileSystem, on gitHub and on S3 and can be copied around without the need to change anything inside the folder structure. In the spirit of OME.ZARR: "Save once, read everywhere". I would hope this may be possible without having to explicitly specify different image and table locations for fileSystem, gitHub and S3, hoping that everything could be relative.

I think what stops us currently is:

We need different XML to point to different image data locations
The image data cannot be hosted on GitHub

Essentially, I am thinking if those are the only two issues that prevent us from a much simpler and more portable solution, I would try to rather address those two issues with some specific solutions than adapting the whole storage location specification to accommodate them.

What do you think? (I am also thinking in terms of writing the publication, I think it would be a selling point if everything would work with relative paths "everywhere").

constantinpape commented 3 years ago

Overall I am trying to see whether we could find a solution such that really the same folder structure works on the fileSystem, on gitHub and on S3 and can be copied around without the need to change anything inside the folder structure.

I agree, that makes a lot of sense. (With the exception that we cannot store image data on github, but I think having the s3 store(s) in the project metadata is a good solution for this).

I think what stops us currently is:
1. We need different XML to point to different image data locations

I still don't like introducing hard-coded folder structures for this. I have two alternative suggestions:

As far as I understand, the bdv.n5.s3 format is something that is only used by us in MoBIE, correct? If that's true, I would suggest that we get rid of it completely and just extend the "normal" bdv.n5 xml with fields that store the information that we need to load from s3 (which I think would just be the relative path for the image data inside of the s3; the address of the root s3 being stored in the project data).
Always store the s3 xml file in the same folder as the normal xml and just add a .s3 extension. I would only consider this as a fallback if we cannot go with option 1, which I think would be much cleaner.

2. The image data cannot be hosted on GitHub

I agree, we can just put the available object store location(s) into the project metadata to cover this.

There is one corner case that we don't cover this way: loading image data from different object stores in the same MoBIE instance. A use-case for this is a published project (where all the data is inside of a publicly available s3), where one then wants to add some other data that should be kept private (by putting it inside a protected s3), e.g. for a follow-up publication that is in progress. I think we could either a) not allow this use case, b) give the option to over-ride the global s3 loaded from the project metadata for each image source.

tischi commented 3 years ago

As far as I understand, the bdv.n5.s3 format is something that is only used by us in MoBIE, correct? If that's true, I would suggest that we get rid of it completely and just extend the "normal" bdv.n5 xml with fields that store the information that we need to load from s3 (which I think would just be the relative path for the image data inside of the s3; the address of the root s3 being stored in the project data). Always store the s3 xml file in the same folder as the normal xml and just add a .s3 extension. I would only consider this as a fallback if we cannot go with option 1, which I think would be much cleaner.

I agree 100% and will have a look into the Java code, probably I'll have some time tomorrow.

I agree, we can just put the available object store location(s) into the project metadata to cover this.

Great!

b) give the option to over-ride the global s3 loaded from the project metadata for each image source.

Something like this sounds good. I think also supporting absolute S3 paths (pointing to where-ever) is very important, because it would allow us to create projects linking already hosted image data without the need to duplicate them into our folder structure.

constantinpape commented 3 years ago

I agree 100% and will have a look into the Java code, probably I'll have some time tomorrow.

I agree, we can just put the available object store location(s) into the project metadata to cover this.

Great!

b) give the option to over-ride the global s3 loaded from the project metadata for each image source.

Something like this sounds good. I think also supporting absolute S3 paths (pointing to where-ever) is very important, because it would allow us to create projects linking already hosted image data without the need to duplicate them into our folder structure

Ok, I will create a sample project for what I think makes sense and send it to you later and we can iterate based on that.

constantinpape commented 3 years ago

So the new layout I would suggest:

project.json specify the s3 root for accessing the image data. This is a list, so that multiple stores could be used, and each entry contains the relevant data for one store. By default, the first entry is used (selecting other roots would only be possible in some advanced mode)

"s3Root": [{"endpoint": "https://s3.com", "bucket": "my_bucket", "region": "us-west"}]

dataset.json:source Each source gives the relative path w.r.t. the dataset root, and optionally one can also give "s3Store", which overrides the default s3 address if the project is loaded from s3. Same for the table data

imageData: {
  "format": "bdv.n5",
  "relativePath": "images/my_image.xml",
  "s3Store": "https://s3.com/my_image.xml"  # this is optional and overrides the default s3 address, which would be "s3Root" + "dataset" +"relativePath"
}
tableData: {
  "format": "tsv",
  "relativePath": "tables/my_tables",
  "s3Store": "..."
}

In addition, we modify the xml for bdv.n5 to allow pulling the information for the s3 access inside. I will add the suggestions for this once I worked on it.

tischi commented 3 years ago

In addition, we modify the xml for bdv.n5 to allow pulling the information for the s3 access inside.

I am not sure there should be any s3 specific information inside of the xml, because ideally we want the xml to be portable. As said, I will have a look at the Java side of things, hopefully tomorrow.

constantinpape commented 3 years ago

I am not sure there should be any s3 specific information inside of the xml, because ideally we want the xml to be portable. As said, I will have a look at the Java side of things, hopefully tomorrow.

But we need to have s3 specific information inside of the xml, otherwise we don't know where to find the image data. Note that we cannot assume that the data is at the same place on the s3 bucket as it is locally. That's not the case for many of our existing projects, and it would be very expensive to change this now, because it would mean rewriting a lot of files in s3. (This issue will be different for ome.zarr, where we only have the actual image data and no additional xml. So there, we don't need to make any distinctions. But for the bdv xml there is no way around this.)

tischi commented 3 years ago

Note that we cannot assume that the data is at the same place on the s3 bucket as it is locally.

Could you give an example for this? I was not aware this is the case.

constantinpape commented 3 years ago

Sure, for example in the platybrowser case, the local paths are different for the local and remote image data: Local: https://github.com/mobie/platybrowser-datasets/blob/master/data/1.0.1/images/local/prospr-6dpf-1-whole-ache.xml#L5 Remote: https://github.com/mobie/platybrowser-datasets/blob/master/data/1.0.1/images/remote/prospr-6dpf-1-whole-ache.xml#L29

(We did not enforce any rules on this before, so I didn't always follow a completely fixed schema in the upload, esp. because uploading doesn't always just work with a single script call for the more complicated datasets.)

tischi commented 3 years ago

And within one project are the paths within S3 consistent? Because if that's the case maybe using this as an s3 root would do? 0.6.3/images/remote/

tischi commented 3 years ago

(including the absolute part in front, of course)

constantinpape commented 3 years ago

And within one project are the paths within S3 consistent? Because if that's the case maybe using this as an s3 root would do? 0.6.3/images/remote/

No they are not necessarily consistent. For example for platybrowser due to the versioning scheme, that corresponds to files in different folders: https://github.com/mobie/platybrowser-datasets/blob/master/data/1.0.1/images/remote/sbem-6dpf-1-whole-raw.xml#L29 https://github.com/mobie/platybrowser-datasets/blob/master/data/1.0.1/images/local/sbem-6dpf-1-whole-segmented-cells.xml#L5

tischi commented 3 years ago

Ah, I see this was because we were pointing to previous versions if the image existed already, is it?

constantinpape commented 3 years ago

Ah, I see this was because we were pointing to previous versions if the image existed already, is it?

Yes, exactly.

tischi commented 3 years ago

But then would specifying the "s3Store" be enough to handle those cases? (without needing to have additional absolute s3 information in the xml)? We would probably need to upload more xml into the s3 buckets to accompany the images, but maybe that's doable and something we anyway wanted to do? And the path inside the xml would be relative to the location of the xml (which is the spec of the bdv xml).

We can also have a zoom tomorrow morning about this..

constantinpape commented 3 years ago

But then would specifying the "s3Store" be enough to handle those cases?

Yes, that would also be an option, but a slightly messy one, because we need to upload additional xml files to the s3 which is quite some work compared to just changing things on github.

(without needing to have additional absolute s3 information in the xml)?

I would not specify any absolute s3 information in the xml, but just specify the relative path inside the bucket.

My idea would be to just add one more field to the xml, something like "s3ImageLoader"."Key": "path/in/bucket"

I would prefer the second option of slightly modifying the xml instead of slightly misusing the s3store, because we would still have the idea of having one root s3 location and then only providing the relative paths (which are read from the xml). And because we don't have to do any changes in the existing s3 containers.

tischi commented 3 years ago

Maybe allowing for two relative paths in the xml? One for file system and one for s3?

constantinpape commented 3 years ago

Maybe allowing for two relative paths in the xml? One for file system and one for s3?

Yes, that's the idea

tischi commented 3 years ago

I am still confused 🤔 When loading the project from github, I don't think the code should ever look at the xml on github, but on the xml on s3, because that's where the images are and the xml are part of the image data.

constantinpape commented 3 years ago

I am still confused thinking When loading the project from github, I don't think the code should ever look at the xml on github, but on the xml on s3, because that's where the images are and the xml are part of the image data.

Ah ok, now I see what you mean. I assumed we can still load xmls from github and then use the s3 information in there to load the actual image data.

If we want to enforce putting the xmls on s3, we will also need to re-upload xmls to the s3 buckets. While this may be conceptually a bit cleaner, I don't really like it from a development perspective, because moving around things on gitub is so much easier than on s3...

Also we would still need different relative paths in the xml, because of the different relative paths of the n5 files w.r.t. local and s3.

tischi commented 3 years ago

Also we would still need different relative paths in the xml, because of the different relative paths of the n5 files w.r.t. local and s3.

Yes, but having different xml local and on s3 would solve this without the need to do anything special.

While this may be conceptually a bit cleaner, I don't really like it from a development perspective, because moving around things on gitub is so much easier than on s3...

Yes, but we want to also support opening the project only from s3, then the xml need to be there anyway, isn't it?

constantinpape commented 3 years ago

Yes, but having different xml local and on s3 would solve this without the need to do anything special.

But then the xml has to be changed to be uploaded to s3 and changed back to be valid for the local project and we cannot have both on the same git branch -> that's a huge mess in managing the data, esp. since we need this feature for the platy project, where the data is potentially still changing.

Yes, but we want to also support opening the project only from s3, then the xml need to be there anyway, isn't it?

Yes, but I didn't really plan to make all the projects we have right now available through s3 only, because we have github as an entrypoint for them already.

Nevertheless supporting to open projects from s3 only is important, but the use case for this (at least from my side) is rather for finished projects where nothing is expected to change that can be uploaded to s3 in bulk, because I think that this will be much easier for most users who then don't have to deal with both github and s3.

constantinpape commented 3 years ago

I have an example for the new proposed data spec now: https://github.com/mobie/arabidopsis-root-lm-datasets/tree/new-data-spec2.

It's following the layout proposed here: https://github.com/mobie/mobie.github.io/issues/49#issuecomment-855757393 and I have added a field to the xml to indicate the path of the n5 file inside of the s3 bucket: https://github.com/mobie/arabidopsis-root-lm-datasets/blob/new-data-spec2/data/arabidopsis-root/images/local/lm-cells.xml#L30-L32.

I am still assuming that we can load the xmls from github right now, if we want to change that, the xmls (here) would need to be uploaded to the s3 under the right path as well. (But as I said, I would be very happy if we still allow reading the xmls from github, because otherwise making any changes to the projects will be much more work for me).

There is also a local version on the EMBL group share of the project: /g/kreshuk/pape/Work/mobie/arabidopsis-root-lm-datasets.

@tischi maybe it's best if we have a short zoom on this tomorrow, potentially after you had a quick look at the new layout.

tischi commented 3 years ago

I think your current proposal would work.

I understand like this:

Loading images from github requires providing an image data root path.
Depending on the root path the ImageLoader or the S3ImageLoader will be used, from the xml on github

Right?

If we stick to this logic, we could in fact switch to ZARR without needing the voxel size, because we could make an xml with a ZARR image loader, containing the voxel size.

If we now decide to read the xml from GitHub, in terms of Java code, it would be easier for me and stay consistent with the current logic of how the xml work, to, as also proposed at some point, have two xml next to each other (one ending on .s3.xml).

tischi commented 3 years ago

I am thinking now we could in fact also more or less leave it as it was, maybe just cleaning up the xmls, in a sense that we do not have two folders "local" and "remote", but just save them next to each other, with different endings. And then also keep the paths in the s3.xml absolute as we had them, which fixes all sorts of issues.

And the Java code would just pick the correct xml, based on some user choice.

And if we, for now, also use xml to open the ome.zarr we don't need anything, is it? In fact, I just saw that I seem to have the code for opening ome.zarr via xml already, which as mentioned above would also fix the voxel size issue.

And, since we do not urgently need it, we postpone supporting relative s3 paths to a later edition?

constantinpape commented 3 years ago

{
  "imageData": {
     "bdv.n5": {"relativePath": "..."},
     "bdv.n5.s3": {"relativePath": "..."},
     "openOrganelle": {"s3Address": "..."}
  }
}

constantinpape commented 3 years ago

{
   "tableData": {"tsv": {"relativePath": "..."}}
}

constantinpape commented 3 years ago

project.json: "imageDataFormats": [...].

tischi commented 3 years ago

Cool, dann koennen wir auch bvd.ome.zarr (with xml) und ome.zarr (without xml) haben 🥳 , which allows us to use the xml for the voxel size as a temporary solution in a relatively clean way.

constantinpape commented 3 years ago

@tischi project with new data spec is there: github: https://github.com/mobie/arabidopsis-root-lm-datasets/tree/new-data-spec3 filesystem: /g/kreshuk/pape/Work/mobie/arabidopsis-root-lm-datasets

constantinpape commented 3 years ago

I converted a couple more projects, we now have:

https://github.com/mobie/arabidopsis-root-lm-datasets/tree/new-data-spec3
https://github.com/mobie/sponge-fibsem-project/tree/new-data-spec
https://github.com/mobie/yeast-clem-datasets/tree/new-data-spec
https://github.com/mobie/zebrafish-lm-datasets/tree/new-data-spec
https://github.com/mobie/platybrowser-datasets/tree/new-data-spec
/g/emcf/pape/plankton-fibsem-project (the data is not on s3 set, so currently can only load from filesystem)

constantinpape commented 3 years ago

I have updated these repos now to have the new data spec on the spec-v2 branches:

tischi commented 3 years ago

I could not find the other issue, so I am posting here.

Does that look good for the auto-detection of the storage?

if ( settings.values.getImageDataFormat() == null )
        {
            final List< ImageDataFormat > imageDataFormats = project.getImageDataFormats();
            if ( projectLocation.startsWith( "http" ) )
            {
                if ( imageDataFormats.contains( ImageDataFormat.BdvN5S3 ) )
                    settings = settings.imageDataFormat( ImageDataFormat.BdvN5S3 );
                else if ( imageDataFormats.contains( ImageDataFormat.OmeZarrS3 ) )
                    settings = settings.imageDataFormat( ImageDataFormat.OmeZarrS3 );
                else
                    throw new UnsupportedOperationException( "Could not find an S3 storage of the images." );
            } 
            else
            {
                if ( imageDataFormats.contains( ImageDataFormat.BdvN5 ) )
                    settings = settings.imageDataFormat( ImageDataFormat.BdvN5 );
                else if ( imageDataFormats.contains( ImageDataFormat.OmeZarr ) )
                    settings = settings.imageDataFormat( ImageDataFormat.OmeZarr );
                else
                    throw new UnsupportedOperationException( "Could not find a file system storage of the images." );
            }
        }

constantinpape commented 3 years ago

@tischi overall looks good, but the if is missing two formats: BdvOmeZarrS3 and OpenOrganelleS3. The else is missing BdvOmeZarr and BdvHdf5 (we still support hdf5 for local files, at least in the spec; if you want we could also remove it).

tischi commented 3 years ago

Thanks! Done via https://github.com/mobie/mobie-viewer-fiji/commit/7abaf3b5959e66f504f26e63a76c222a379eed7c

mobie / mobie.github.io

Add support for different file formats to spec #49