Open normanrz opened 2 years ago
unassigning myself for the moment, since sprint priorities have changed
Note: In this context we may also create a way of deleting datasets, without breaking their annotations (i.e. still allowing to list + download them)
I don’t think we can easily reuse the displayName, because several existing values contain spaces and parentheses etc. Unless we decide that we can just drop/convert these.
If we want to keep that option, we will then need
id nameOnDisk (or path? directory? note that it’s not the full path) name (for use in uris, strict restrictions, but no longer unique) displayName (no restrictions, free text field, not unique. as before)
Another thought: since we still need the nameOnDisk anyway, could we not use that for most APIs as well? e.g. in datastore and worker jobs.
We could just make the display name (or maybe another name field) a more prominent feature :thinking: And switch to the id-based URI format, with fallback to the old format.
I get the feeling I don’t yet know well enough what the desired outcome is here. Depending on that we may have to either change almost all APIs or only a few.
since we still need the nameOnDisk anyway, could we not use that for most APIs as well? e.g. in datastore and worker jobs.
I don't think that is a good idea. Mid-term I'd like to make datasets virtual, ie. they don't need a folder on disk. For that, we need id-based URIs. Doesn't seem wise to refactor to nameOnDisk when we want to change that in a few months to ids.
Keeping both displayName and name going forward seems unnecessary
Thoughts
Notes from my talk with @normanrz:
Detailed Description
Datasets should be renamable. For that, we should decouple the URL from the dataset name in Notion-style (i.e.
<name>-<id>
, but only theid
part is actually used). I would suggest to add a new dataset name field that is separate from the name that is used for locating the dataset on-disk (maybe rename that to "path"?). Not sure, whether we should reuse thedisplayName
field. We should check the current occurrences and impose the same restrictions as on dataset names. For backwards compatibility, we resolve the name in the URL on a best-effort basis.