huggingface / transformers

đŸ¤— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.5k stars 26.9k forks source link

rag model evaluation program bug #21165

Closed AbrahamBob closed 1 year ago

AbrahamBob commented 1 year ago

System Info

transformers=4.25.1 huggingface-hub=0.10.1 tokenizers =0.13.2 python=3.7

Who can help?

No response

Information

Tasks

Reproduction

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'RagTokenizer'. The class this function is called from is 'BartTokenizerFast'. Loading passages from https://storage.googleapis.com/huggingface-nlp/datasets/wiki_dpr Traceback (most recent call last): File "/home/nano/transformers/examples/research_projects/rag/eval_rag.py", line 321, in main(args) File "/home/nano/transformers/examples/research_projects/rag/eval_rag.py", line 295, in main retriever = RagRetriever.from_pretrained(checkpoint, **model_kwargs) File "/home/nano/transformers/src/transformers/models/rag/retrieval_rag.py", line 429, in from_pretrained index = cls._build_index(config) File "/home/nano/transformers/src/transformers/models/rag/retrieval_rag.py", line 400, in _build_index config.index_path or LEGACY_INDEX_PATH, File "/home/nano/transformers/src/transformers/models/rag/retrieval_rag.py", line 108, in init self.passages = self._load_passages() File "/home/nano/transformers/src/transformers/models/rag/retrieval_rag.py", line 133, in _load_passages passages_path = self._resolve_path(self.index_path, self.PASSAGE_FILENAME) File "/home/nano/transformers/src/transformers/models/rag/retrieval_rag.py", line 117, in _resolve_path resolved_archive_file = cached_file(index_path, filename) File "/home/nano/transformers/src/transformers/utils/hub.py", line 420, in cached_file local_files_only=local_files_only, File "/home/nano/miniconda3/envs/rag/lib/python3.7/site-packages/huggingface_hub/file_download.py", line 1022, in hf_hub_download cache_dir, repo_folder_name(repo_id=repo_id, repo_type=repo_type) File "/home/nano/miniconda3/envs/rag/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py", line 92, in _inner_fn validate_repo_id(arg_value) File "/home/nano/miniconda3/envs/rag/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py", line 137, in validate_repo_id "Repo id must be in the form 'repo_name' or 'namespace/repo_name':" huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'https://storage.googleapis.com/huggingface-nlp/datasets/wiki_dpr'. Use repo_type argument if needed.

Expected behavior

I run the evaluation program of the rag model, and after adding hyperparameters according to the example, it prompts an error

sgugger commented 1 year ago

This example relies on earlier version of Transformers and HuggingFace Hub, you should downgrade them.

AbrahamBob commented 1 year ago

@sgugger I'm sorry, can you give some advice about the version?I have tried several versions myself without success.

sgugger commented 1 year ago

It looks like this example was released along with Transformers 3.2.0 or 3.3.0.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.