Allow to rename datasets

normanrz commented 2 years ago

Detailed Description

Datasets should be renamable. For that, we should decouple the URL from the dataset name in Notion-style (i.e. <name>-<id>, but only the id part is actually used). I would suggest to add a new dataset name field that is separate from the name that is used for locating the dataset on-disk (maybe rename that to "path"?). Not sure, whether we should reuse the displayName field. We should check the current occurrences and impose the same restrictions as on dataset names. For backwards compatibility, we resolve the name in the URL on a best-effort basis.

fm3 commented 1 year ago

unassigning myself for the moment, since sprint priorities have changed

fm3 commented 1 year ago

Note: In this context we may also create a way of deleting datasets, without breaking their annotations (i.e. still allowing to list + download them)

fm3 commented 3 months ago

I don’t think we can easily reuse the displayName, because several existing values contain spaces and parentheses etc. Unless we decide that we can just drop/convert these.

If we want to keep that option, we will then need

id nameOnDisk (or path? directory? note that it’s not the full path) name (for use in uris, strict restrictions, but no longer unique) displayName (no restrictions, free text field, not unique. as before)

fm3 commented 3 months ago

Another thought: since we still need the nameOnDisk anyway, could we not use that for most APIs as well? e.g. in datastore and worker jobs.

We could just make the display name (or maybe another name field) a more prominent feature :thinking: And switch to the id-based URI format, with fallback to the old format.

I get the feeling I don’t yet know well enough what the desired outcome is here. Depending on that we may have to either change almost all APIs or only a few.

normanrz commented 3 months ago

since we still need the nameOnDisk anyway, could we not use that for most APIs as well? e.g. in datastore and worker jobs.

I don't think that is a good idea. Mid-term I'd like to make datasets virtual, ie. they don't need a folder on disk. For that, we need id-based URIs. Doesn't seem wise to refactor to nameOnDisk when we want to change that in a few months to ids.

normanrz commented 3 months ago

Keeping both displayName and name going forward seems unnecessary

fm3 commented 3 months ago

Thoughts

dataset id also needs to be serialized to NMLs
heuristic to select dataset from ambiguous names: take the oldest (hypothesis: old NMLs/URIs were generated before the renaming was ok, so probably old dataset)

MichaelBuessemeyer commented 2 months ago

Notes from my talk with @normanrz:

The main motivation is that datasets are no longer referenced by their name but by their id. This enables to have multiple datasets with the same name. The on-disk directory in which a dataset is store should be decoupled from its name. For readability, we use an addressing schema like used by notion as described in the issue description.
A row is needed that stores the directory name: e.g. "path"
name & displayName are kinda duplicates, just keep displayName. In the URL replace all special characters with something more friendly like - . or just omit them. Take a look at how notion handles this.
Problem: only few datasets have a display name set. => use displayName where possible, else use name as fallback
Problem: To be still able to locate dataset on disc keep a "copy" of the old name column and name it e.g. path; Kinda like a legacy name
When creating a new dataset use the name given by the user as path in case it is still unique. Else use the new dataset's id.

scalableminds / webknossos

Allow to rename datasets #6613

Detailed Description