huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.31k stars 2.7k forks source link

remove filecheck to enable symlinks #7133

Open fschlatt opened 3 months ago

fschlatt commented 3 months ago

Enables streaming from local symlinks #7083

@lhoestq

HuggingFaceDocBuilderDev commented 3 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lhoestq commented 2 months ago

The CI is failing, looks like it breaks imagefolder loading.

I just checked fsspec internals and maybe instead we can detect symlink by checking islink and size to make sure it's a file

if info["type"] == "file" or (info.get("islink") and info["size"])
lhoestq commented 2 months ago

hmm actually size doesn't seem to filter symlinked directories, we need another way

fschlatt commented 2 months ago

Does fsspec perhaps allow resolving symlinks? Something like https://docs.python.org/3/library/pathlib.html#pathlib.Path.resolve

lhoestq commented 2 months ago

there is info["destination"] in case of a symlink, so maybe

if info["type"] == "file" or (info.get("islink") and info.get("destination") and os.path.isfile(info["destination"]))