mgrankin / ru_transformers

Apache License 2.0
776 stars 108 forks source link

generation of text #12

Closed piegu closed 4 years ago

piegu commented 4 years ago

Hi,

After training my gpt-2 in Portuguese after your README.md, I tried to generate text. I used the following code in my terminal but I got an error. message Could you give me the correct code? Thank you.

export OUTPUT=output_yt/s
python run_generation.py \
    --model_type=gpt2 \
    --model_name_or_path=$OUTPUT \
    --padding_text="Eu gosto do carro que comprei ontem"

The error message I got:

01/06/2020 15:25:37 - INFO - transformers.tokenization_utils -   Model name 'output_yt/s' not found in model shortcut name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). Assuming 'output_yt/s' is a path or url to a directory containing tokenizer files.
01/06/2020 15:25:37 - INFO - transformers.tokenization_utils -   Didn't find file output_yt/s/vocab.json. We won't load it.
01/06/2020 15:25:37 - INFO - transformers.tokenization_utils -   Didn't find file output_yt/s/merges.txt. We won't load it.
01/06/2020 15:25:37 - INFO - transformers.tokenization_utils -   Didn't find file output_yt/s/added_tokens.json. We won't load it.
01/06/2020 15:25:37 - INFO - transformers.tokenization_utils -   Didn't find file output_yt/s/special_tokens_map.json. We won't load it.
01/06/2020 15:25:37 - INFO - transformers.tokenization_utils -   Didn't find file output_yt/s/tokenizer_config.json. We won't load it.
Traceback (most recent call last):
  File "run_generation.py", line 204, in <module>
    main()
  File "run_generation.py", line 166, in main
    tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path)
  File "/opt/anaconda3/envs/gpt/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 302, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/opt/anaconda3/envs/gpt/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 370, in _from_pretrained
    list(cls.vocab_files_names.values())))
OSError: Model name 'output_yt/s' was not found in tokenizers model name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). We assumed 'output_yt/s' was a path or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.
mgrankin commented 4 years ago

To understand how to generate text you should start by looking at rest.py.

piegu commented 4 years ago

Thanks Mikhail. It helped :-)