Closed joeleonjr closed 1 week ago
Some public resources require sharing your information before you can view them. Idk if this is something that can be handled programmatically, or at least logged/documented.
2024-06-26T19:08:35-04:00 error trufflehog failed to fetch model {"organization": "ibm", "model": "https://huggingface.co/ibm/testing-patchtst_etth1_forecast.git", "error": "access is restricted."}
2024-06-26T19:08:40-04:00 error trufflehog failed to fetch model {"model": "https://huggingface.co/ibm/testing-patchtst_etth1_forecast.git", "error": "access is restricted."}
Addendum: Git LFS support (in the future) would be a tremendous benefit for Huggingface. The nature of the platform means there's lots of large files hosted external to Git.
e.g., https://huggingface.co/dmis-lab/biobert-v1.1/blob/main/flax_model.msgpack
Some public resources require sharing your information before you can view them. Idk if this is something that can be handled programmatically, or at least logged/documented.
2024-06-26T19:08:35-04:00 error trufflehog failed to fetch model {"organization": "ibm", "model": "https://huggingface.co/ibm/testing-patchtst_etth1_forecast.git", "error": "access is restricted."} 2024-06-26T19:08:40-04:00 error trufflehog failed to fetch model {"model": "https://huggingface.co/ibm/testing-patchtst_etth1_forecast.git", "error": "access is restricted."}
Addendum: Git LFS support (in the future) would be a tremendous benefit for Huggingface. The nature of the platform means there's lots of large files hosted external to Git.
e.g., https://huggingface.co/dmis-lab/biobert-v1.1/blob/main/flax_model.msgpack
I attempted to handle that with the “access is restricted” message. What did you envision?
Also, completely agree on LFS. But that’s a much bigger endeavor.
I attempted to handle that with the “access is restricted” message. What did you envision?
To me, "access is restricted" implies that you can't access those models. You can, you just need to click an "I agree" button. A call to action would make this clearer; the language in HuggingFace's prompt is perfect, actually.
Does the API have a specific error code or message for "you must share your contact information"?
I hear what you're saying
I attempted to handle that with the “access is restricted” message. What did you envision?
To me, "access is restricted" implies that you can't access those models. You can, you just need to click an "I agree" button. A call to action would make this clearer; the language in HuggingFace's prompt is perfect, actually.
Does the API have a specific error code or message for "you must share your contact information"?
I hear what you're saying. The challenge is there are situations where you have to wait for an org to approve your request. The error message is the same for both types: {"error":"Access to model ibm/testing-patchtst_etth1_forecast is restricted. You must be authenticated to access it."}
So afaik there's no easy/reliable way to differentiate between models that require a simple click vs. waiting for admin approval. Also, not sure if we'd want to click the agree button on behalf of user's accounts.
Does the API have a specific error code or message for "you must share your contact information"?
It turns out that the API contains a property for gated models ("gated": "auto" | true | false,
), however, you can't see that until you have access. 😩
The error message is the same for both types: {"error":"Access to model ibm/testing-patchtst_etth1_forecast is restricted. You must be authenticated to access it."}
That's only for unauthenticated requests. There seem to be three different types of errors.
It's private and you don't have access
HTTP/2 404
{"error":"Repository not found"}
It's gated and your request isn't authenticated or auth is invalid
# For some reason, excluding the
$ curl -i "https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B" -H "Authorization: Bearer hf_fake"
HTTP/2 401
{"error":"Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it."}
3. It's gated and your request is authenticated.
```sh
$ curl -i "https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B" -H "Authorization: Bearer $TOKEN"
HTTP/2 403
{"error":"Access to model meta-llama/Meta-Llama-3-8B is restricted and you are not in the authorized list. Visit https://huggingface.co/meta-llama/Meta-Llama-3-8B to ask for access."}
@rgmz I just pushed a change so 403 error msgs will provide more clear information. 401s will result in an "API Key is Invalid" type error message. Lmk if you think that language is sufficient.
@joeleonjr PR looking good, one thing that did stand out is if you give the ./trufflehog huggingface
command with no arguments the program will continue and give users a false sense of scanning something. We need to check to make sure if at least one of model, space, dataset, org, user
is set
@joeleonjr PR looking good, one thing that did stand out is if you give the
./trufflehog huggingface
command with no arguments the program will continue and give users a false sense of scanning something. We need to check to make sure if at least one ofmodel, space, dataset, org, user
is set
Done. I followed the same logic used for GitHub.
Description:
This PR adds HuggingFace as a new source. Users will have the ability to scan a HF model, dataset or space. A HF token is required for all scans except basic
git
scans of a public model, dataset or space. That means all org/user enumeration, discussions/PR enumeration, private scanning, etc. requires a token. Tokens are free and rate limiting doesn't seem to come into play when using the API.I added a couple test files: one to address the client functionality (since there is no golang HF package) and one to address the scanning logic. Coverage is pretty high on both, but not perfect.
Checklist:
make test-community
)?make lint
this requires golangci-lint)?