huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
2.02k stars 531 forks source link

huggingface-cli upload - Validate README.md before file hashing #2452

Closed hlky closed 1 month ago

hlky commented 1 month ago

Closes #2451

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky commented 1 month ago

The only other thing I can think of is that there will be duplicate calls to the endpoint with upload_folder path, once during _prepare_upload_folder_additions then again during create_commit, if the extra load to the endpoint will be an issue there could be something like a boolean self.yaml_validated to skip the validation in create_commit.

Wauplin commented 1 month ago

Yeah, I thought the same as well - thanks for raising the question. Honestly, I don't think the extra overhead is really problematic. We could also have a small @functools.cache decorator on _validate_yaml (though it would be nice to cache only based on content+repo_type, not the token)

hlky commented 1 month ago

The fastai integration test failure appears to be because repo_type is None. It hasn't triggered before repo_type is set to default value in create_commit. I'd suggest we duplicate that to upload_folder.

import requests

content = '---\ntags:\n- fastai\n---\n\n# Amazing!\n\n🥳 Congratulations on hosting your fastai model on the Hugging Face Hub!\n\...n## Intended uses & limitations\nMore information needed\n\n## Training and evaluation data\nMore information needed\n'
url = "https://hub-ci.huggingface.co/api/validate-yaml"
r = requests.post(url, {"content": content, "repoType": None}) # {'error': '"value" does not match any of the allowed types'}

r = requests.post(url, {"content": content, "repoType": "model"}) # {'errors': [], 'warnings': []}
Wauplin commented 1 month ago

Thanks for looking into it. Yes, I would simply duplicate this line in def _validate_yaml as you mentioned.

repo_type = repo_type if repo_type is not None else REPO_TYPE_MODEL
Wauplin commented 1 month ago

Perfect! Thanks for the iterations on this @hlky! Failing tests are unrelated to this PR so we are good to merge now! :tada: