Closed animator closed 1 year ago
Same issue here. Same huggingface_hub
version.
Hi @animator @chawins , thanks for reporting us this issue and sorry for the late reply. The issue comes from a server-side change (search is been revamped). I made a PR https://github.com/huggingface/huggingface_hub/pull/1300 to make the huggingface_hub
API more robust to server-side changes.
Overall we have quite low usage of this feature + it's quite some legacy code. At some point it will be completely revisited but in the meantime I hope this fix will be enough for you to use it. Please remember it is mainly meant for exploratory purposes.
(see also related discussion: https://github.com/huggingface/huggingface_hub/pull/1250)
Thanks for the quick response/fix @Wauplin! I tested the updated main branch, and DatasetSearchArguments
seems to work now.
I ended up filtering via tags instead which kind of suits my need better too. In case anyone is looking for a similar workaround, here's what I went with:
hf_api = hf_hub.HfApi()
model_args = hf_hub.ModelSearchArguments()
filt = hf_hub.ModelFilter(
task=model_args.pipeline_tag.ImageClassification,
library=model_args.library.PyTorch,
)
models = hf_api.list_models(filter=filt)
# hf_hub.DatasetSearchArguments() is buggy so we go with searching
# "imagenet" in tags instead
models = filter(lambda m: any("imagenet" in t for t in m.tags), models)
Thanks for the feedback and for sharing the snippet !
Just to clarify it, what ModelSearchArguments
does it to provide an helper to find the desired tag. But in the end, model_args.library.PyTorch
is strictly the string "pytorch"
. And DatasetSearchArguments().dataset_name.imagenet
IS "imagenet"
.
So you could also do:
from huggingface_hub import HfApi, ModelFilter
hf_api = HfApi()
models = hf_api.list_models(
filter=ModelFilter(task="image-classification", library="pytorch", trained_dataset="imagenet")
)
That's what I meant by ModelSearchArguments
and DatasetSearchArguments
are purely for exploratory purposes. If you already know what you are looking for, you can do the search without using them. It saves you the ~10s is takes to initialize them. Hope that makes it clearer :)
Describe the bug
KeyError: 'multilinguality'
when callingDatasetSearchArguments()
Reproduction
Logs
System info