Closed alvations closed 2 months ago
I am also getting this error with this dataset: https://huggingface.co/datasets/google/IFEval
Me too, didn't have this issue few hours ago.
same observation. I even downgraded datasets==2.20.0
and huggingface_hub==0.23.5
leading me to believe it's an issue on the server.
any known workarounds?
Not a good idea, but commenting out the whole security block at /usr/local/lib/python3.10/dist-packages/huggingface_hub/hf_api.py
is a temporary workaround:
#security = kwargs.pop("security", None)
#if security is not None:
# security = BlobSecurityInfo(
# safe=security["safe"], av_scan=security["avScan"], pickle_import_scan=security["pickleImportScan"]
# )
#self.security = security
Uploading a dataset to Huggingface also results in the following error in the Dataset Preview:
The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
'safe'
Error code: UnexpectedError
Need help to make the dataset viewer work? Make sure to review [how to configure the dataset viewer](link1), and [open a discussion](link2) for direct support.
I used jsonl format for the dataset in this case. Same exact dataset worked previously.
Same issue here. Even reverting to older version of datasets
(e.g., 2.19.0
) results in same error:
>>> datasets.load_dataset('allenai/ai2_arc', 'ARC-Easy')
File "/Users/lucas/miniforge3/envs/oe-eval-internal/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3048, in <listcomp>
RepoFile(**path_info) if path_info["type"] == "file" else RepoFolder(**path_info)
File "/Users/lucas/miniforge3/envs/oe-eval-internal/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 534, in __init__
safe=security["safe"], av_scan=security["avScan"], pickle_import_scan=security["pickleImportScan"]
KeyError: 'safe'
i just had this issue a few minutes ago, crawled the internet and found nothing. came here to open an issue and found this. it is really frustrating. anyone found a fix?
hi, me and my team have the same problem
Yeah, this just suddenly appeared without client-side code changes, within the last hours.
Here's a patch to fix the issue temporarily:
import huggingface_hub
def patched_repofolder_init(self, **kwargs):
self.path = kwargs.pop("path")
self.tree_id = kwargs.pop("oid")
last_commit = kwargs.pop("lastCommit", None) or kwargs.pop("last_commit", None)
if last_commit is not None:
last_commit = huggingface_hub.hf_api.LastCommitInfo(
oid=last_commit["id"],
title=last_commit["title"],
date=huggingface_hub.utils.parse_datetime(last_commit["date"]),
)
self.last_commit = last_commit
def patched_repo_file_init(self, **kwargs):
self.path = kwargs.pop("path")
self.size = kwargs.pop("size")
self.blob_id = kwargs.pop("oid")
lfs = kwargs.pop("lfs", None)
if lfs is not None:
lfs = huggingface_hub.hf_api.BlobLfsInfo(size=lfs["size"], sha256=lfs["oid"], pointer_size=lfs["pointerSize"])
self.lfs = lfs
last_commit = kwargs.pop("lastCommit", None) or kwargs.pop("last_commit", None)
if last_commit is not None:
last_commit = huggingface_hub.hf_api.LastCommitInfo(
oid=last_commit["id"],
title=last_commit["title"],
date=huggingface_hub.utils.parse_datetime(last_commit["date"]),
)
self.last_commit = last_commit
self.security = None
# backwards compatibility
self.rfilename = self.path
self.lastCommit = self.last_commit
huggingface_hub.hf_api.RepoFile.__init__ = patched_repo_file_init
huggingface_hub.hf_api.RepoFolder.__init__ = patched_repofolder_init
i'm thinking this should be a server issue, i mean no client code was changed on my end. so weird!
As far as I can tell, this seems to be happening with all datasets that use RepoFolder (probably represents most datasets on huggingface, right?)
Here is a temporary fix for the problem: https://discuss.huggingface.co/t/i-keep-getting-keyerror-safe-when-loading-my-datasets/105669/12?u=mlscientist
this doesn't seem to work!
In case you are using Colab or similar, remember to restart your session after modyfing the hf_api.py file
No need to modify the file directly, just monkey-patch.
I'm now more sure that the error appears because the backend expects the api code to look like it does on main
. If RepoFile
and RepoFolder
look about like they look on main, they work again.
If not fixed like above, a secondary error that will appear is
return self.info(path, expand_info=False)["type"] == "directory"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"tree_id": path_info.tree_id,
^^^^^^^^^^^^^^^^^
AttributeError: 'RepoFolder' object has no attribute 'tree_id'
We've reverted the deployment, please let us know if the issue still persists!
thanks @muellerzr!
Describe the bug
The dataset loading was throwing some safety errors for this popular dataset
wmt14
.[in]:
[out]:
Steps to reproduce the bug
See above.
Expected behavior
Dataset properly loaded.
Environment info
version: 2.21.0