Closed sabetAI closed 3 years ago
Hi, I have a related issue. This happen to "facebook/rag-token-base"
and "facebook/rag-token-nq"
and "facebook/rag-sequence-nq"
as well.
Basic loading failed (was able to do it until around 2 days ago -- I use version 3.5.0) Both
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
and
retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=True)
result in the same error message:
OSError: Can't load tokenizer for 'facebook/rag-sequence-nq/question_encoder_tokenizer'.
<<< Seem like it add the wrong path question_encoder_tokenizer
at the end.
to add to @ratthachat's comment: I observe the same problem when loading the model with:
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq")
Tagging @julien-c @Pierrci here. Maybe an issue related to the migration to git/git-lfs
Initial poster seems to be running transformers version: 3.3.1
which makes me suspect it might not be related to the git/git-lfs migration
Update: @lhoestq is looking into it
@lhoestq @julien-c @thomwolf
Sorry to ask, but I am translating TFRag and would really love to continue before long hollidays.
Could it be possible to fix only the wrong file path (the last question_encoder_tokenizer
) in
OSError: Can't load tokenizer for 'facebook/rag-sequence-nq/question_encoder_tokenizer'.
to fix error of basic loading
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
or
retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=True)
or
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq")
Apologies for any duplicate comments, but experiencing the same issue as @ratthachat. Any updates or fixes on this? Currently running transformers-3.5.1
Hello, feel free to open a PR with your proposed fix and we'll take a look. Thanks!
Can confirm that this error is eliminated when downgrading to:
transformers==3.3.1
tokenizers==0.9.2
datasets==1.1.2
Looks very likely that something went wrong in the transition to git-lfs for this use case.
@thomwolf @julien-c
Thanks for the detailed reports everyone, this should now be fixed on master
.
@julien-c
Hi I am trying to run use_own_knowledge_dataset.py with Transformers Version: 3.5.1. But it gives the following error.
OSError: Can't load tokenizer for 'facebook/rag-sequence-nq/question_encoder_tokenizer'. Make sure that:
- 'facebook/rag-sequence-nq/question_encoder_tokenizer' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'facebook/rag-seq
```uence-nq/question_encoder_tokenizer' is the correct path to a directory containing relevant tokenizer files
@julien-c
Hi I am trying to run use_own_knowledge_dataset.py with Transformers Version: 3.5.1. But it gives the following error.
OSError: Can't load tokenizer for 'facebook/rag-sequence-nq/question_encoder_tokenizer'. Make sure that: - 'facebook/rag-sequence-nq/question_encoder_tokenizer' is a correct model identifier listed on 'https://huggingface.co/models' - or 'facebook/rag-seq ```uence-nq/question_encoder_tokenizer' is the correct path to a directory containing relevant tokenizer files
Hey @shamanez - could you open a separate issue for this and tag @lhoestq ? :-)
Sure :)
The fix is not yet in a released version only on master
, so you need to install from master for now.
so shall I install from sources?
Thank you! When will the fixed version be released?
Environment info
transformers
version: 3.3.1Who can help
@patrickvonplaten, @lhoestq
Information
Model I am using (Bert, XLNet ...):
facebook/rag-sequence-base
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
run
sh finetune.sh
withgives:
Model name 'facebook/rag-sequence-base/question_encoder_tokenizer' not found in model shortcut name list (facebook/dpr-question_encoder-single-nq-base). Assuming 'facebook/rag-sequence-base/question_encoder_tokenizer' is a path, a model identifier, or url to a directory containing tokenizer files. loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/vocab.txt from cache at /h/asabet/.cache/torch/transformers/14d599f015518cd5b95b5d567b8c06b265dbbf04047e44b3654efd7cbbacb697.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084 loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/added_tokens.json from cache at None loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/special_tokens_map.json from cache at /h/asabet/.cache/torch/transformers/70614c7a84151409876eaaaecb3b5185213aa5c560926855e35753b9909f1116.275045728fbf41c11d3dae08b8742c054377e18d92cc7b72b6351152a99b64e4 loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/tokenizer_config.json from cache at /h/asabet/.cache/torch/transformers/8ade9cf561f8c0a47d1c3785e850c57414d776b3795e21bd01e58483399d2de4.11f57497ee659e26f830788489816dbcb678d91ae48c06c50c9dc0e4438ec05b loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/tokenizer.json from cache at None Model name 'facebook/rag-sequence-base/generator_tokenizer' not found in model shortcut name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). Assuming 'facebook/rag-sequence-base/generator_tokenizer' is a path, a model identifier, or url to a directory containing tokenizer files. loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/vocab.json from cache at /h/asabet/.cache/torch/transformers/3b9637b6eab4a48cf2bc596e5992aebb74de6e32c9ee660a27366a63a8020557.6a4061e8fc00057d21d80413635a86fdcf55b6e7594ad9e25257d2f99a02f4be loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/merges.txt from cache at /h/asabet/.cache/torch/transformers/b2a6adcb3b8a4c39e056d80a133951b99a56010158602cf85dee775936690c6a.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/added_tokens.json from cache at None loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/special_tokens_map.json from cache at /h/asabet/.cache/torch/transformers/342599872fb2f45f954699d3c67790c33b574cc552a4b433fedddc97e6a3c58e.6e217123a3ada61145de1f20b1443a1ec9aac93492a4bd1ce6a695935f0fd97a loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/tokenizer_config.json from cache at /h/asabet/.cache/torch/transformers/e5f72dc4c0b1ba585d7afb7fa5e3e52ff0e1f101e49572e2caaf38fab070d4d6.d596a549211eb890d3bb341f3a03307b199bc2d5ed81b3451618cbcb04d1f1bc loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/tokenizer.json from cache at None Traceback (most recent call last): File "finetune.py", line 499, in
main(args)
File "finetune.py", line 439, in main
model: GenerativeQAModule = GenerativeQAModule(args)
File "finetune.py", line 105, in init
retriever = RagPyTorchDistributedRetriever.from_pretrained(hparams.model_name_or_path, config=config)
File "/h/asabet/.local/lib/python3.6/site-packages/transformers/retrieval_rag.py", line 308, in from_pretrained
config, question_encoder_tokenizer=question_encoder_tokenizer, generator_tokenizer=generator_tokenizer
File "/scratch/ssd001/home/asabet/transformers/examples/rag/distributed_retriever.py", line 41, in init
index=index,
TypeError: init() got an unexpected keyword argument 'index'
Expected behavior
finetune.sh should launch and run