huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
2k stars 528 forks source link

Support multi-file and/or folder uploads via the Hugging Face Hub UI #707

Open josephrocca opened 2 years ago

josephrocca commented 2 years ago

Is your feature request related to a problem? Please describe. I'm trying to upload a tfjs model using the Hugging Face Hub UI. tfjs models are automatically broken down into shards that are 4mb each during the conversion process. This can result in several dozen files (84 in my particular case) so it's impractical to upload them one by one.

Describe the solution you'd like I'd like the UI to have the capability to accept folders, and/or multiple files. If multiple files end up being supported, but folders aren't, then I'd also like to have the ability to create a new folder via the UI. I need the ability to create folders because I want to have multiple tfjs models in a single model repository (e.g. different versions of same model), and they each need their own folder due to the way tfjs works.

Describe alternatives you've considered Using the programmatic upload API seems like one alternative, assuming that it is able to create folders (related issue).

Thanks!

julien-c commented 2 years ago

for context/curiosity, why are you not using git in your worflow?

josephrocca commented 2 years ago

@julien-c Maybe I should be! But in this case I just want to upload a big batch of model files at once, and it's handy to have that functionality even on Github. Maybe this is an unusual request, but if people are using the upload UI on the hugging face hub, then presumably some subset of those people would like to upload several files at a time. It'd be annoying even 5 or so files one by one. If the upload UI is very rarely used on hugging face, such that it's not worth putting further investment into, then feel free to close this issue.

julien-c commented 2 years ago

No this is a valid feature request @josephrocca – and users actually use the Web uploads from what I can see.

(just wanted to make sure to understand if there was another underlying limitation in git)

Pierrci commented 2 years ago

cc @SBrandeis

josephrocca commented 2 years ago

I've just noticed that it's not even possible to concurrently upload two files in separate tabs. It appears to work, but then when the second upload finishes it shows an error: "Error: A long-running operation is already ongoing on dataset "

image

It shows that error even if the first upload has fully finished. For example, here I uploaded apple_test.tar and facebook_test.tar at the same time. I started the apple.tar upload first, and as you can see it finished well before the facebook.tar one (I switch the the finished apple tab at ~5 sec into the video), and the facebook.tar upload continues, but then errors at the end. The error doesn't seem to make sense, since there are no other operations ongoing at the time that the facebook.tar upload finishes.

https://user-images.githubusercontent.com/1167575/156907624-24f43bff-6ff6-4841-8caf-6f1f547b1246.mp4

In any case, the net result here is that if you have multiple files to upload, it's currently very tedious. This bug is kind of orthogonal to the feature request in this issue/thread, since, regardless of a multi-upload feature (in a single browser tab), it would be annoying if a user didn't know about this and started two large uploads (in separate tabs) and one of them failed right at the end.

Thanks!