Open severo opened 1 month ago
Note that we're currently using the unmerged branch https://github.com/huggingface/datasets/compare/datasets-2.19.1-hotfix. Not sure what we should do, @albertvillanova ?
I can take care of this.
The hot fix in the dedicated branch
was releases in datasets-2.20.0:
This is the corresponding PR:
After updating to datasets-2.21.0, we should also review the changes introduced by datasets-2.20.0: https://github.com/huggingface/datasets/releases/tag/2.20.0
Yes, sure, we have to see what has changed between the current version and 2.21.0 and how it affects our code.
About the hot fix: one of the last commits was to revert the change from json to ujson. Is it relevant for us?
Link to the hot fix branch: https://github.com/huggingface/datasets/commits/datasets-2.19.1-hotfix/
The reversion of the change you mentioned is explained here: https://github.com/huggingface/dataset-viewer/pull/2870#pullrequestreview-2087846308
OK, so: in datasets@main, we still use json, not ujson, right?
https://github.com/huggingface/datasets/releases/tag/2.21.0
When done, we should refresh some datasets, like https://huggingface.co/datasets/ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions/discussions/1#66bcd7e2f1685a3ade2e55f5