mobie / mobie.github.io

1 stars 3 forks source link

table type #15

Closed tischi closed 3 years ago

tischi commented 3 years ago

/g/kreshuk/pape/Work/data/mobie/example-project/example-dataset/tables/em-segmentation/default.tsv

I need to auto-generate the path to the default table. This is harder if I do not know the ending...

tischi commented 3 years ago

I could do an exists and loop through options.

Alternative (I like it better): we put everything explicitly into the spec:

"segmentation": {
        "imageDataLocations": {
          "local": "images/local/em-segmentation.xml",
          "remote": "images/remote/em-segmentation.xml"
        },
        "menuItem": "segmentation/name",
        "tableDataLocation": "tables/em-segmentation/default.tsv",
        "view": {

And we deal with the "additional" tables in some other way.

constantinpape commented 3 years ago

The default table is always called default.tsv.

tischi commented 3 years ago

OK, not sure I also like it to be explicit.

constantinpape commented 3 years ago

From https://mobie.github.io/specs/mobie_spec.html#dataset:

The tables directory contains all tabular data assoicated with segmentations or grid views (see data specification and view specification for details) All tables associated with one segmentation or view, must be located in the same subdirectory, which must contain a table default.tsv and may contain additional tables. See the table data specification for details on how tables are stored.
constantinpape commented 3 years ago

So I think we have two equivalent options here:

In the first case you have to cut default.tsv to get the table folder directory, in the second case you get the folder directly. I prefer case two because that's one operation less. But otherwise both options are equivalent and explicit.

tischi commented 3 years ago

I would prefer option one, because then one can directly show it. We could then write in the spec that the other tables MUST be in the same folder and SHOULD (MUST?) have the same delimiter as the default one (which one see from the filename if we go for option one).

tischi commented 3 years ago

I also think option one is easier for basic users, which may only have one table.

constantinpape commented 3 years ago

I would prefer option one, because then one can directly show it.

You can still directly show it if you know that the folder must contain default.tsv. Otherwise the config is invalid.

We could then write in the spec that the other tables MUST be in the same folder and SHOULD (MUST?) have the same delimiter as the default one (which one see from the filename if we go for option one).

I don't think we need to demand that the other tables have the same delimiter. We already have a SHOULD be tsv, but MAY be csv and I think it unnecessarily complicates things if we also make the delimiters of the tables dependent on other tables. In terms of parsing the additional tables that also shouldn't make a big difference and I think is even a bit easier when specifying the folder (using python pseudo code):

default_table = join(table_folder, "default.tsv")
assert exists(default_table)
additional_table_names = listdir(table_folder)  # we already have this in FileAndUrlUtils
# filter out names that don't end with .tsv or .csv and the default table
additional_table_names = [name for name in table_names if (re.match("(.tsv|.csv)$", name) and name != "default.tsv"))]

I also think option one is easier for basic users, which may only have one table.

Why? I don't see a difference for users in specifying a filename vs. the folder name.

There is another reason I prefer option 2: I think it's semantically more correct. "tableDataLocation": \path\to\tables\default.tsv" implies that this is the location of all the table data. But that's not correct, because the location is the folder with the tables inside of it.

tischi commented 3 years ago

OK, let's leave it as is!