huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.97k stars 513 forks source link

upload resume #2083

Open ehartford opened 6 months ago

ehartford commented 6 months ago

when I upload a folder, and it fails in the middle of a file, I would like it to automatically resume where it left off the next time I try to upload it.

ie:

$ huggingface-cli upload cognitivecomputations/dolphin-phi2-kensho dolphin-phi-kensho .
...
 _upload_lfs_files
    _wrapped_lfs_upload(filtered_actions[0])
  File "/Users/eric/miniconda3/envs/textgen/lib/python3.11/site-packages/huggingface_hub/_commit_api.py", line 393, in _wrapped_lfs_upload
    raise RuntimeError(f"Error while uploading '{operation.path_in_repo}' to the Hub.") from exc
RuntimeError: Error while uploading 'model-00001-of-00002.safetensors' to the Hub.
$ huggingface-cli upload cognitivecomputations/dolphin-phi2-kensho dolphin-phi-kensho .
<should resume instead of starting from the beginning>
Wauplin commented 6 months ago

Hi @ehartford, thanks for reporting and sorry for the inconvenience. Unfortunately we don't have a logic to resume a upload when it connection broke in the middle of a file. The only advice I can give you is to try to commit smaller chunks (e.g. only 1 file at a time instead of the full repository). We also have an helper to upload a folder in multiple commits in order to resume the upload. However, this usually proves useful when uploading large folders. Here are the docs about it: https://huggingface.co/docs/huggingface_hub/guides/upload#upload-a-folder-by-chunks

ehartford commented 6 months ago

I'm aware that you don't have the logic.

That's why I made this feature request, asking you to add the logic.

Thank you for you consideration

Wauplin commented 6 months ago

Adding the https://github.com/huggingface/huggingface_hub/labels/enhancement label then. Wasn't sure at first if you were asking for some documentation tips or a proper feature request :) Agree with you it would be a great addition.

bilgehanertan commented 5 months ago

Adding the enhancement New feature or request label then. Wasn't sure at first if you were asking for some documentation tips or a proper feature request :) Agree with you it would be a great addition.

I would like to help with this issue. I am not sure about how to implement a resume strategy, but to prevent future inconveniences we can integrate the multi_commits parameter into the CLI (since there is no option to enable multi commits in CLI upload) and maybe suggest using multi_commit if the size is bigger than a threshold before starting the upload process?

Wauplin commented 5 months ago

Hi @bilgehanertan, thanks for proposing your help! I am actually in the process of reviewing/refactoring the multi_commit process to make it more reliable and robust, especially when uploading large folders. In this regard, I'd prefer to delay this feature request a bit, so that we don't duplicate work or add temporary flags. I'll let you know when I make some progress :)

versae commented 4 months ago

I also faced the same issue uploading a big model with multi_commit, but it seems the discussion branches where the PRs are created are not directly accessible, so resuming/fixing the upload using git is not possible.

However, running exactly the same api.upload_folder() command will resume the upload when using multi_commit, although I've been getting 501 errors from amazonaws.com when I run it.