huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.03k stars 27.02k forks source link

cached_path disappeared from the API #21678

Closed johann-petrak closed 1 year ago

johann-petrak commented 1 year ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

A tool using an older version of transformers uses

from transformers.file_utils import cached_path

However, this api function disappeared sometime in 2022 but I could not find any information about what it should get replaced with or similar, change log or similar.

However even in the current transformers repo, this method gets used in some example files, sometimes re-defining this method, sometimes using the same import which does not work any longer:

examples/research_projects/visual_bert/modeling_frcnn.py:from utils import WEIGHTS_NAME, Config, cached_path, hf_bucket_url, is_remote_url, load_checkpoint
examples/research_projects/visual_bert/modeling_frcnn.py:                resolved_archive_file = cached_path(
examples/research_projects/visual_bert/utils.py:            resolved_config_file = cached_path(
examples/research_projects/visual_bert/utils.py:def cached_path(
examples/research_projects/lxmert/modeling_frcnn.py:from utils import WEIGHTS_NAME, Config, cached_path, hf_bucket_url, is_remote_url, load_checkpoint
examples/research_projects/lxmert/modeling_frcnn.py:                resolved_archive_file = cached_path(
examples/research_projects/lxmert/utils.py:            resolved_config_file = cached_path(
examples/research_projects/lxmert/utils.py:def cached_path(
examples/research_projects/pplm/run_pplm.py:from transformers.file_utils import cached_path
examples/research_projects/pplm/run_pplm.py:        resolved_archive_file = cached_path(params["url"])
examples/research_projects/pplm/run_pplm.py:            filepath = cached_path(BAG_OF_WORDS_ARCHIVE_MAP[id_or_path])
examples/research_projects/seq2seq-distillation/_test_bash_script.py:from transformers.file_utils import cached_path
examples/research_projects/seq2seq-distillation/_test_bash_script.py:        data_cached = cached_path(

Expected behavior

The import should be possible for backwards compatibility or documentation explain what to replace it with.

Version 4.21.0 seems the last version where that function could get imported

sgugger commented 1 year ago

The cached_path API was a private util for our downloads (note that we consider as private anything that is not in the main init). None of the research projects are actively maintained so they will only work with the version of Transformers corresponding to their creation.

You should use the huggingface_hub library to manage downloads and cache of files on the Hub now. The closest thing we have to cached_path is transformers.utils.hub.cached_file in the current version.

johann-petrak commented 1 year ago

Thanks, I will see if I can monkey-patch that tool/library accordingly.