Open severo opened 2 years ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
It would be a new task, that could be used on the Hub.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Reopening in the light of https://huggingface.co/spaces/librarian-bots/huggingface-datasets-semantic-search / https://twitter.com/vanstriendaniel/status/1689336183959203840.
Instead of searching by similarity in the metadata, though, the idea would be to check similarity in the data itself.
See https://huggingface.co/spaces/asoria/datasets-similarity-tool by @AndreaFrancis
It would be useful to find, for a given dataset, which are the nearest datasets in relation to their content.