Open JGSweets opened 2 days ago
Hi @JGSweets, thanks for opening the issue. The 2 PRs you've linked are only related to "downloading to a local directory", not the generic "downloading into the HF cache directory" workflow. If we add such a validation, we would do it for both. The main problem with checking the file integrity after a download is the time it takes to do it:
huggingface_hub
cc @Pierrci @julien-c in case you have other opinion
Is your feature request related to a problem? Please describe. After reviewing: #1738 and #2223 it looks like file checksums are only computed on the cache dir in specific conditions. Ideally, a user could knowingly force a checksum post download as well as on retrieval from cache to ensure integrity of the files with any usage.
It's possible I misunderstood the code or discussion though.
Describe the solution you'd like Add an input arg and environment variable to enforce checksums on files for each
hf_hub_download
call on the retrieved files.Describe alternatives you've considered Pre-downloading files manually and manually checking file integrity before using the cached files.