hendrycks / math

The MATH Dataset (NeurIPS 2021)
MIT License
879 stars 85 forks source link

Which version of transformers lib is being used ? #4

Closed crazysal closed 3 years ago

crazysal commented 3 years ago

Hi, Thanks for the updates. Still unable to run evaluation script.


Traceback (most recent call last):
  File "eval_math_gpt.py", line 373, in <module>
    run_eval(args)
  File "eval_math_gpt.py", line 174, in run_eval
    output_ids = model.generate(
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'GPT2LMHeadModel' object has no attribute 'generate'
crazysal commented 3 years ago

Similar other errors :

(env2020) sahmed9@hal:~/reps/math/modeling$ python eval_math_gpt.py --arch=gpt2 --math-dataroot=../data/MATH/test/*/*.json 
2021-07-15 15:21:32.964776: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
{'arch': 'gpt2',
 'load': None,
 'math_dataroot': '../data/MATH/test/*/*.json',
 'math_mode': 'gpt2-eval',
 'num_beams': 20,
 'peek_fraction': 1.0,
 'tokenizer_merges_file': None,
 'workers': 4}
MATHDataset: Loaded 5000 samples.
  0%|                                                                                                              | 0/5000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "eval_math_gpt.py", line 373, in <module>
    run_eval(args)
  File "eval_math_gpt.py", line 161, in run_eval
    for i, batch in enumerate(tqdm(dataloader)):
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/tqdm/std.py", line 1130, in __iter__
    for obj in iterable:
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 219, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/home/sahmed9/reps/math/modeling/dataset/base_math_dataset.py", line 85, in __getitem__
    curr_sample, fname = self.get_random_sample()
  File "/home/sahmed9/reps/math/modeling/dataset/base_math_dataset.py", line 160, in get_random_sample
    random_sample = self.clean_sample((q, a))
  File "/home/sahmed9/reps/math/modeling/dataset/MATH.py", line 405, in clean_filter_sample_gpt_eval
    question_ids = torch.LongTensor(self.tokenizer.encode("\nQUESTION:\n" + question, verbose=False))
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 782, in encode
    encoded_inputs = self.encode_plus(text,
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 877, in encode_plus
    first_ids = get_input_ids(text)
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 869, in get_input_ids
    return self.convert_tokens_to_ids(self.tokenize(text, **kwargs))
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 700, in tokenize
    tokenized_text = split_on_tokens(added_tokens, text)
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 695, in split_on_tokens
    return list(itertools.chain.from_iterable((self._tokenize(token, **kwargs) if token not \
  File "/home/sahmed9/anaconda3/envs/env2020/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 695, in <genexpr>
    return list(itertools.chain.from_iterable((self._tokenize(token, **kwargs) if token not \
TypeError: _tokenize() got an unexpected keyword argument 'verbose'
crazysal commented 3 years ago
(env2020) sahmed9@hal:~/reps/math/modeling$ python eval_math_gpt.py --arch=gpt2 --math-dataroot=../data/MATH/test/*/*.json 
2021-07-15 15:23:23.429014: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "eval_math_gpt.py", line 359, in <module>
    parser.add_argument('--arch', default='gpt2', choices=transformers.GPT2_PRETRAINED_MODEL_ARCHIVE_LIST)
AttributeError: module 'transformers' has no attribute 'GPT2_PRETRAINED_MODEL_ARCHIVE_LIST'
hendrycks commented 3 years ago

I believe we used version 4.2.2. Huggingface has a tendency to break functionality across versions.

crazysal commented 3 years ago

updating to latest version worked. thanks !