[Train] Provide a list of models for people to choose from in the HF transformer example

Description

I tried 2 transformer models on HF, both of which didn't work.
We should provide a list of models that can run out of box for people to try out.
We also need to add a warning that people may need to modify the code to make other transformer models work.
(base)  ray@g-784b96e5cffee0001:~/default$ /home/ray/anaconda3/bin/python /home/ray/default/test-9.py --model_name_or_path gpt2 --task_name cola
2023-06-26 15:12:52.363657: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-26 15:12:52.528325: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-06-26 15:12:53.352464: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-26 15:12:53.352612: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-26 15:12:53.352629: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
comet_ml is installed but `COMET_API_KEY` is not set.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 725.87it/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 3.61MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 0.99M/0.99M [00:00<00:00, 15.5MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 9.27MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.29M/1.29M [00:00<00:00, 19.4MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 523M/523M [00:04<00:00, 115MB/s]
loading weights file https://huggingface.co/gpt2/resolve/main/pytorch_model.bin from cache at /home/ray/.cache/huggingface/transformers/752929ace039baa8ef70fe21cdf9ab9445773d20e733cf693d667982e210837e.323c769945a351daa25546176f8208b3004b6f563438a7603e7932bae9025925
All model checkpoint weights were used when initializing GPT2ForSequenceClassification.

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Running tokenizer on dataset: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 56.10ba/s]
Running tokenizer on dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 117.81ba/s]
Running tokenizer on dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 118.41ba/s]
[15:13:04] INFO     Sample 4314 of the training set: {'input_ids': [1026, 3088, 284, 6290, 13], 'attention_mask': [1, 1, 1, 1, 1], 'labels': 0}.                                         test-9.py:428
           INFO     Sample 5772 of the training set: {'input_ids': [23865, 15063, 48241, 351, 257, 24556, 290, 2269, 585, 494, 750, 523, 1165, 13], 'attention_mask': [1, 1, 1, 1, 1, 1, test-9.py:428
                    1, 1, 1, 1, 1, 1, 1, 1], 'labels': 1}.                                                                                                                                            
           INFO     Sample 5763 of the training set: {'input_ids': [42493, 33577, 2630, 257, 3734, 3348, 319, 36079, 313, 721, 13], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], test-9.py:428
                    'labels': 1}.                                                                                                                                                                     
/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/accelerator.py:499: FutureWarning: The `use_fp16` property is deprecated and will be removed in version 1.0 of Accelerate use `Accelerator.mixed_precision == 'fp16'` instead.
  warnings.warn(
/home/ray/anaconda3/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
           INFO     ***** Running training *****                                                                                                                                         test-9.py:519
           INFO       Num examples = 8551                                                                                                                                                test-9.py:520
           INFO       Num Epochs = 3                                                                                                                                                     test-9.py:521
           INFO       Instantaneous batch size per device = 8                                                                                                                            test-9.py:522
           INFO       Total train batch size (w. parallel, distributed & accumulation) = 8                                                                                               test-9.py:526
           INFO       Gradient Accumulation steps = 1                                                                                                                                    test-9.py:530
           INFO       Total optimization steps = 3207                                                                                                                                    test-9.py:531
  0%|                                                                                                                                                                        | 0/3207 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ /home/ray/default/test-9.py:629 in <module>                                                      │
│                                                                                                  │
│   626                                                                                            │
│   627                                                                                            │
│   628 if __name__ == "__main__":                                                                 │
│ ❱ 629 │   main()                                                                                 │
│ /home/ray/default/test-9.py:625 in main                                                          │
│                                                                                                  │
│   622 │                                                                                          │
│   623 │   else:                                                                                  │
│   624 │   │   # Run training locally.                                                            │
│ ❱ 625 │   │   train_func(config)                                                                 │
│   626                                                                                            │
│   627                                                                                            │
│   628 if __name__ == "__main__":                                                                 │
│                                                                                                  │
│ /home/ray/default/test-9.py:540 in train_func                                                    │
│                                                                                                  │
│   537 │                                                                                          │
│   538 │   for epoch in range(args.num_train_epochs):                                             │
│   539 │   │   model.train()                                                                      │
│ ❱ 540 │   │   for step, batch in enumerate(train_dataloader):                                    │
│   541 │   │   │   outputs = model(**batch)                                                       │
│   542 │   │   │   loss = outputs.loss                                                            │
│   543 │   │   │   loss = loss / args.gradient_accumulation_steps                                 │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/accelerate/data_loader.py:377 in __iter__        │
│                                                                                                  │
│   374 │   │   dataloader_iter = super().__iter__()                                               │
│   375 │   │   # We iterate one batch ahead to check when we are at the end                       │
│   376 │   │   try:                                                                               │
│ ❱ 377 │   │   │   current_batch = next(dataloader_iter)                                          │
│   378 │   │   except StopIteration:                                                              │
│   379 │   │   │   yield                                                                          │
│   380                                                                                            │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py:628 in __next__   │
│                                                                                                  │
│    625 │   │   │   if self._sampler_iter is None:                                                │
│    626 │   │   │   │   # TODO(https://github.com/pytorch/pytorch/issues/76750)                   │
│    627 │   │   │   │   self._reset()  # type: ignore[call-arg]                                   │
│ ❱  628 │   │   │   data = self._next_data()                                                      │
│    629 │   │   │   self._num_yielded += 1                                                        │
│    630 │   │   │   if self._dataset_kind == _DatasetKind.Iterable and \                          │
│    631 │   │   │   │   │   self._IterableDataset_len_called is not None and \                    │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py:671 in _next_data │
│                                                                                                  │
│    668 │                                                                                         │
│    669 │   def _next_data(self):                                                                 │
│    670 │   │   index = self._next_index()  # may raise StopIteration                             │
│ ❱  671 │   │   data = self._dataset_fetcher.fetch(index)  # may raise StopIteration              │
│    672 │   │   if self._pin_memory:                                                              │
│    673 │   │   │   data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)            │
│    674 │   │   return data                                                                       │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py:61 in fetch     │
│                                                                                                  │
│   58 │   │   │   │   data = [self.dataset[idx] for idx in possibly_batched_index]                │
│   59 │   │   else:                                                                               │
│   60 │   │   │   data = self.dataset[possibly_batched_index]                                     │
│ ❱ 61 │   │   return self.collate_fn(data)                                                        │
│   62                                                                                             │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/data/data_collator.py:247 in        │
│ __call__                                                                                         │
│                                                                                                  │
│    244 │   return_tensors: str = "pt"                                                            │
│    245 │                                                                                         │
│    246 │   def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, Any]:                 │
│ ❱  247 │   │   batch = self.tokenizer.pad(                                                       │
│    248 │   │   │   features,                                                                     │
│    249 │   │   │   padding=self.padding,                                                         │
│    250 │   │   │   max_length=self.max_length,                                                   │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2836 in  │
│ pad                                                                                              │
│                                                                                                  │
│   2833 │   │   │   │   encoded_inputs[key] = to_py_obj(value)                                    │
│   2834 │   │                                                                                     │
│   2835 │   │   # Convert padding_strategy in PaddingStrategy                                     │
│ ❱ 2836 │   │   padding_strategy, _, max_length, _ = self._get_padding_truncation_strategies(     │
│   2837 │   │   │   padding=padding, max_length=max_length, verbose=verbose                       │
│   2838 │   │   )                                                                                 │
│   2839                                                                                           │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2372 in  │
│ _get_padding_truncation_strategies                                                               │
│                                                                                                  │
│   2369 │   │                                                                                     │
│   2370 │   │   # Test if we have a padding token                                                 │
│   2371 │   │   if padding_strategy != PaddingStrategy.DO_NOT_PAD and (not self.pad_token or sel  │
│ ❱ 2372 │   │   │   raise ValueError(                                                             │
│   2373 │   │   │   │   "Asking to pad but the tokenizer does not have a padding token. "         │
│   2374 │   │   │   │   "Please select a token to use as `pad_token` `(tokenizer.pad_token = tok  │
│   2375 │   │   │   │   "or add a new pad token via `tokenizer.add_special_tokens({'pad_token':   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via 
`tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.
  0%|                                                                                                                                                                        | 0/3207 [00:00<?, ?it/s]
(base)  ray@g-784b96e5cffee0001:~/default$ /home/ray/anaconda3/bin/python /home/ray/default/test-9.py --model_name_or_path finbert --task_name cola
2023-06-26 15:16:59.542883: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-26 15:16:59.708840: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-06-26 15:17:00.545921: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-26 15:17:00.546040: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-26 15:17:00.546054: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
comet_ml is installed but `COMET_API_KEY` is not set.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 916.12it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/configuration_utils.py:601 in       │
│ _get_config_dict                                                                                 │
│                                                                                                  │
│   598 │   │                                                                                      │
│   599 │   │   try:                                                                               │
│   600 │   │   │   # Load from URL or cache if already cached                                     │
│ ❱ 601 │   │   │   resolved_config_file = cached_path(                                            │
│   602 │   │   │   │   config_file,                                                               │
│   603 │   │   │   │   cache_dir=cache_dir,                                                       │
│   604 │   │   │   │   force_download=force_download,                                             │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/utils/hub.py:282 in cached_path     │
│                                                                                                  │
│    279 │                                                                                         │
│    280 │   if is_remote_url(url_or_filename):                                                    │
│    281 │   │   # URL, so get it from the cache (downloading if necessary)                        │
│ ❱  282 │   │   output_path = get_from_cache(                                                     │
│    283 │   │   │   url_or_filename,                                                              │
│    284 │   │   │   cache_dir=cache_dir,                                                          │
│    285 │   │   │   force_download=force_download,                                                │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/utils/hub.py:545 in get_from_cache  │
│                                                                                                  │
│    542 │   │   │   │   │   │   " to False."                                                      │
│    543 │   │   │   │   │   )                                                                     │
│    544 │   │   │   │   else:                                                                     │
│ ❱  545 │   │   │   │   │   raise ValueError(                                                     │
│    546 │   │   │   │   │   │   "Connection error, and we cannot find the requested files in the  │
│    547 │   │   │   │   │   │   " Please try again or make sure your Internet connection is on."  │
│    548 │   │   │   │   │   )                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ /home/ray/default/test-9.py:629 in <module>                                                      │
│                                                                                                  │
│   626                                                                                            │
│   627                                                                                            │
│   628 if __name__ == "__main__":                                                                 │
│ ❱ 629 │   main()                                                                                 │
│ /home/ray/default/test-9.py:625 in main                                                          │
│                                                                                                  │
│   622 │                                                                                          │
│   623 │   else:                                                                                  │
│   624 │   │   # Run training locally.                                                            │
│ ❱ 625 │   │   train_func(config)                                                                 │
│   626                                                                                            │
│   627                                                                                            │
│   628 if __name__ == "__main__":                                                                 │
│                                                                                                  │
│ /home/ray/default/test-9.py:322 in train_func                                                    │
│                                                                                                  │
│   319 │   #                                                                                      │
│   320 │   # In distributed training, the .from_pretrained methods guarantee that                 │
│   321 │   # only one local process can concurrently download model & vocab.                      │
│ ❱ 322 │   config = AutoConfig.from_pretrained(                                                   │
│   323 │   │   args.model_name_or_path, num_labels=num_labels, finetuning_task=args.task_name     │
│   324 │   )                                                                                      │
│   325 │   tokenizer = AutoTokenizer.from_pretrained(                                             │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py:6 │
│ 80 in from_pretrained                                                                            │
│                                                                                                  │
│   677 │   │   kwargs["_from_auto"] = True                                                        │
│   678 │   │   kwargs["name_or_path"] = pretrained_model_name_or_path                             │
│   679 │   │   trust_remote_code = kwargs.pop("trust_remote_code", False)                         │
│ ❱ 680 │   │   config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path,   │
│   681 │   │   if "auto_map" in config_dict and "AutoConfig" in config_dict["auto_map"]:          │
│   682 │   │   │   if not trust_remote_code:                                                      │
│   683 │   │   │   │   raise ValueError(                                                          │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/configuration_utils.py:553 in       │
│ get_config_dict                                                                                  │
│                                                                                                  │
│   550 │   │   """                                                                                │
│   551 │   │   original_kwargs = copy.deepcopy(kwargs)                                            │
│   552 │   │   # Get config dict associated with the base config file                             │
│ ❱ 553 │   │   config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwar   │
│   554 │   │                                                                                      │
│   555 │   │   # That config file may point us toward another config file to use.                 │
│   556 │   │   if "configuration_files" in config_dict:                                           │
│                                                                                                  │
│ /home/ray/anaconda3/lib/python3.8/site-packages/transformers/configuration_utils.py:634 in       │
│ _get_config_dict                                                                                 │
│                                                                                                  │
│   631 │   │   │   │   f"There was a specific connection error when trying to load {pretrained_   │
│   632 │   │   │   )                                                                              │
│   633 │   │   except ValueError:                                                                 │
│ ❱ 634 │   │   │   raise EnvironmentError(                                                        │
│   635 │   │   │   │   f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load thi   │
│   636 │   │   │   │   f"files and it looks like {pretrained_model_name_or_path} is not the pat   │
│   637 │   │   │   │   f"{configuration_file} file.\nCheckout your internet connection or see h   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like finbert is not the path to a directory 
containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Link

No response
ray-project / ray

[Train] Provide a list of models for people to choose from in the HF transformer example #36837

Description

Link