Open albertvillanova opened 1 week ago
is is possible to also list the solution we discussed in that thread? (moving from repo name to _id)
To make discussion a bit more efficient
in particular:
_id field in hf.co/api/datasets - guaranteed immutable for a given repo
Some first thoughts:
_id
-centric logic what is currently based on the repo name:
Then regarding the API:
_id
in addition to the repo name to make sure it gets the right data (even if there is outdated data in the cache somehow)I also considered using _id
everywhere as the source of truth, but I anticipate it will just move the problem elsewhere to the place we will cache the _id
<-> repo name mapping (repo name is always needed to read/write to repos and also for the dataset-viewer API)
Thanks for the complementary information, @lhoestq.
So, basically we would need a complete refactoring of all the logic to identify repositories and you also think that this would just move the problem elsewhere... :thinking:
I am wondering if instead we could effectively face the real underlying problem, that is, properly handle the repository renaming event, even if a new repository with the old name is created.
Reported by @lewtun (internal link: https://huggingface.slack.com/archives/C02EMARJ65P/p1719818961944059):