Can't load config for 'gpt2'

kaiisongit commented 4 years ago

I have only just tried loading GPT2, havent tried to score a sentence yet. Here is the code:

import torch
from lm_scorer.models.auto import AutoLMScorer as LMScorer

device = "cuda:0" if torch.cuda.is_available() else "cpu"
batch_size = 1
scorer = LMScorer.from_pretrained('gpt2', device=device, batch_size=batch_size)

However, when I run it, I get this error:

Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\transformers\configuration_utils.py", line 239, in get_config_dict
    local_files_only=local_files_only,
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\transformers\file_utils.py", line 267, in cached_path
    raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file gpt2\config.json not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scorer.py", line 6, in <module>
    scorer = LMScorer.from_pretrained('gpt2', device=device, batch_size=batch_size)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\lm_scorer\models\auto.py", line 24, in from_pretrained
    return model_class(model_name, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\lm_scorer\models\abc\base.py", line 11, in __init__
    self._build(model_name, kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\lm_scorer\models\gpt2.py", line 19, in _build
    model_name, use_fast=True, add_special_tokens=False
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\transformers\tokenization_auto.py", line 195, in from_pretrained
    config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\transformers\configuration_auto.py", line 196, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\transformers\configuration_utils.py", line 252, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for 'gpt2'. Make sure that:

- 'gpt2' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'gpt2' is the correct path to a directory containing a config.json file

Thank you for your work! This seems to be exactly what I need, if only I could get it to work!

katreparitosh commented 4 years ago

Hello @kaiisongit

Did you manage to find a solution to this issue? I am facing the same problem.

It would be great if you could share it.

Regards, Paritosh

kaiisongit commented 4 years ago

Hi, I didn't find a solution, but I managed to scramble together this code which does what I need - score is a function that when you pass it a sentence as a string, gives you the loss. Lower numbers are better.

import math
import torch
from pytorch_pretrained_bert import OpenAIGPTTokenizer, OpenAIGPTModel, OpenAIGPTLMHeadModel

model = OpenAIGPTLMHeadModel.from_pretrained('openai-gpt')
model.eval()
model.to('cuda')
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')

def score(sentence):
    tokenize_input = tokenizer.tokenize(sentence)
    indexed_tokens = tokenizer.convert_tokens_to_ids(tokenize_input)
    tokens_tensor = torch.tensor([indexed_tokens])
    tokens_tensor = tokens_tensor.to('cuda')
    loss=model(tokens_tensor, lm_labels=tokens_tensor)
    return math.exp(loss)

kaiisongit commented 4 years ago

if you aren't using cuda, you can remove the two lines: model.to('cuda') and tokens_tensor = tokens_tensor.to('cuda')

katreparitosh commented 4 years ago

Hello @kaiisongit,

Thanks for sharing your code.

However, I am working on the transformers-based model, and I started getting the same errors during deployment. Hence, I was looking for a solution.

Anyways, thanks for the prompt reply.

caroarriaga commented 2 years ago

I'm getting a similar error:

ModuleNotFoundError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in _get_module(self, module_name)

10 frames ModuleNotFoundError: No module named 'transformers.models.gpt2.modeling_gpt2'

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in _get_module(self, module_name)

RuntimeError: Failed to import transformers.models.gpt2.modeling_gpt2 because of the following error (look up to see its traceback): No module named 'transformers.models.gpt2.modeling_gpt2'

Was a solution provided?

simonepri / lm-scorer

Can't load config for 'gpt2' #11