Open Wauplin opened 11 months ago
Yea I think waiting for #6269 would be best, or branching from it. For reference, this PR is progressing pretty well which will do similar using the hf hub for our LAION dataset bot https://github.com/LAION-AI/Discord-Scrapers/pull/2.
Is there any update on this?
Is there any update on this?
No update so far on this feature request but for broader context, this announce will help with incremental datasets https://huggingface.co/blog/xethub-joins-hf :)
Feature request
Have the possibility to do
ds.push_to_hub(..., append=True)
.Motivation
Requested in this comment and this comment. Discussed internally on slack.
Your contribution
What I suggest to do for parquet datasets is to use
CommitOperationCopy
+CommitOperationDelete
fromhuggingface_hub
:=> make a single commit with all commit operations at once
I think it should be quite straightforward to implement. Happy to review a PR (maybe conflicting with the ongoing "1 commit push_to_hub" PR https://github.com/huggingface/datasets/pull/6269)