mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.18k stars 519 forks source link

GPT-J errors #1826

Open howudodat opened 3 weeks ago

howudodat commented 3 weeks ago

command:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1 \
   --model=gptj-99 \
   --implementation=reference \
   --framework=pytorch \
   --category=edge \
   --scenario=Offline \
   --execution_mode=test \
   --device=cpu  \
   --docker --quiet \
   --test_query_count=50

error:

Encoding Samples
Finished constructing QSL.
Loading PyTorch model...
Loading checkpoint shards:   0%|                                                                                    | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/cmuser/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 575, in load_state_dict
    return torch.load(
  File "/home/cmuser/.local/lib/python3.10/site-packages/torch/serialization.py", line 1087, in load
    overall_storage = torch.UntypedStorage.from_file(os.fspath(f), shared, size)
RuntimeError: unable to mmap 10004248818 bytes from file </home/cmuser/CM/repos/local/cache/5de735f7d99448f8/checkpoint/checkpoint-final/pytorch_model-00001-of-00003.bin>: Cannot allocate memory (12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cmuser/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 584, in load_state_dict
    if f.read(7) == "version":
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cmuser/CM/repos/local/cache/da5cd8ab8fe54ea3/inference/language/gpt-j/main.py", line 170, in <module>
    main()
  File "/home/cmuser/CM/repos/local/cache/da5cd8ab8fe54ea3/inference/language/gpt-j/main.py", line 109, in main
    sut = get_SUT(
  File "/home/cmuser/CM/repos/local/cache/da5cd8ab8fe54ea3/inference/language/gpt-j/backend_PyTorch.py", line 238, in get_SUT
    return SUT_Offline(model_path, dtype, dataset_path, scenario, max_examples, use_gpu, network, qsl)
  File "/home/cmuser/CM/repos/local/cache/da5cd8ab8fe54ea3/inference/language/gpt-j/backend_PyTorch.py", line 173, in __init__
    SUT_base.__init__(self, model_path, dtype, dataset_path, scenario, max_examples, use_gpu, network, qsl)
  File "/home/cmuser/CM/repos/local/cache/da5cd8ab8fe54ea3/inference/language/gpt-j/backend_PyTorch.py", line 45, in __init__
    self.model = AutoModelForCausalLM.from_pretrained(
  File "/home/cmuser/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/home/cmuser/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3941, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/cmuser/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4395, in _load_pretrained_model
    state_dict = load_state_dict(shard_file, is_quantized=is_quantized)
  File "/home/cmuser/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 596, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/home/cmuser/CM/repos/local/cache/5de735f7d99448f8/checkpoint/checkpoint-final/pytorch_model-00001-of-00003.bin' at '/home/cmuser/CM/repos/local/cache/5de735f7d99448f8/checkpoint/checkpoint-final/pytorch_model-00001-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Finished destroying SUT.
./run.sh: line 59: 1: command not found
./run.sh: line 65: 1: command not found

CM error: Portable CM script failed (name = benchmark-program, return code = 32512)
howudodat commented 3 weeks ago

ok, the above error went away after a refresh of the repos (cm repo pull)

However 3 times in a row I get this same error...it just dies during the test

command:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1 \
   --model=gptj-99 \
   --implementation=reference \
   --framework=pytorch \
   --category=edge \
   --scenario=Offline \
   --execution_mode=test \
   --device=cpu  \
   --docker --quiet \
   --test_query_count=50

error:

Constructing QSL
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 619/619 [00:00<00:00, 5.62MB/s]
vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 798k/798k [00:00<00:00, 3.41MB/s]
merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 2.07MB/s]
added_tokens.json: 100%|█████████████████████████████████████████████████████████████████████████████| 4.04k/4.04k [00:00<00:00, 29.7MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████| 357/357 [00:00<00:00, 3.60MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████| 1.37M/1.37M [00:00<00:00, 5.25MB/s]
/home/cmuser/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Encoding Samples
Finished constructing QSL.
Loading PyTorch model...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 3/3 [02:35<00:00, 51.99s/it]
/home/cmuser/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 285/285 [00:00<00:00, 477959.47it/s]
Running LoadGen test...
Number of Samples in query_samples :  50
 12%|███████████▉                                                                                       | 6/50 [39:35<4:02:43, 330.99s/it]
./run.sh: line 54:   477 Killed                 
 /usr/bin/python3 main.py --model-path=/home/cmuser/CM/repos/local/cache/5de735f7d99448f8/checkpoint/checkpoint-final --dataset-path=/home/cmuser/CM/repos/local/cache/fdb93082bc3a466c/install/cnn_eval.json --scenario Offline --max_examples 50 --mlperf_conf '/home/cmuser/CM/repos/local/cache/861bf247a96946bd/inference/mlperf.conf' --dtype float32 --user_conf '/home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/72f174c5e1f5481ebf2e33a55b03f0d1.conf' 2>&1
./run.sh: line 59: 137: command not found
./run.sh: line 65: 137: command not found
arjunsuresh commented 3 weeks ago

Looks like an OS kill. Do you have sufficient RAM + swap space? float32 run needs about 75 GB of memory. You can try --precision=bfloat16 which will need about ~40GB of memory. --beam_size=2 (official requirement is --beam_size=4) can also be used to further reduce the memory requirement if official compliance is not a requirement.