huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
677 stars 73 forks source link

Upgrade to datasets@2.21.0 #3024

Open severo opened 1 month ago

severo commented 1 month ago

https://github.com/huggingface/datasets/releases/tag/2.21.0

When done, we should refresh some datasets, like https://huggingface.co/datasets/ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions/discussions/1#66bcd7e2f1685a3ade2e55f5

severo commented 3 weeks ago

Note that we're currently using the unmerged branch https://github.com/huggingface/datasets/compare/datasets-2.19.1-hotfix. Not sure what we should do, @albertvillanova ?

albertvillanova commented 3 weeks ago

I can take care of this.

albertvillanova commented 3 weeks ago

The hot fix in the dedicated branch

was releases in datasets-2.20.0:

This is the corresponding PR:

albertvillanova commented 3 weeks ago

After updating to datasets-2.21.0, we should also review the changes introduced by datasets-2.20.0: https://github.com/huggingface/datasets/releases/tag/2.20.0

severo commented 3 weeks ago

Yes, sure, we have to see what has changed between the current version and 2.21.0 and how it affects our code.

About the hot fix: one of the last commits was to revert the change from json to ujson. Is it relevant for us?

albertvillanova commented 3 weeks ago

Link to the hot fix branch: https://github.com/huggingface/datasets/commits/datasets-2.19.1-hotfix/

The reversion of the change you mentioned is explained here: https://github.com/huggingface/dataset-viewer/pull/2870#pullrequestreview-2087846308

severo commented 3 weeks ago

OK, so: in datasets@main, we still use json, not ujson, right?