KeyError while running NVIDIA GPT-J implementation

The command to reproduce:

cm run script --tags=run-mlperf,inference,_find-performance,_full    --model=gptj-99    --implementation=nvidia    --framework=tensorrt    --category=edge    --scenario=Offline    --execution_mode=test    --device=cuda     --docker --quiet    --test_query_count=50

Output:

* cm run script "run-mlperf inference _find-performance _full"

  * cm run script "get mlcommons inference src"
       ! load /home/anandhu/CM/repos/local/cache/08f829c532784225/cm-cached-state.json

  * cm run script "get sut description"

    * cm run script "detect os"
           ! cd /home/anandhu
           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py

    * cm run script "detect cpu"

      * cm run script "detect os"
             ! cd /home/anandhu
             ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
             ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
           ! cd /home/anandhu
           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-cpu/customize.py

    * cm run script "get python3"
         ! load /home/anandhu/CM/repos/local/cache/fc0768f669bd4605/cm-cached-state.json

Path to Python: /home/anandhu/CM/repos/local/cache/0011e46a023746ae/berttest/bin/python3
Python version: 3.12.3

    * cm run script "get compiler"
         ! load /home/anandhu/CM/repos/local/cache/4600294f81924f42/cm-cached-state.json

    * cm run script "get cuda-devices"

      * cm run script "get cuda _toolkit"
           ! load /home/anandhu/CM/repos/local/cache/042d9cdee6644854/cm-cached-state.json

ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no
ENV[CM_CUDA_VERSION]: 12.4
ENV[CM_CUDA_VERSION_STRING]: cu124
ENV[CM_NVCC_BIN_WITH_PATH]: /home/anandhu/CM/repos/local/cache/347380df9f6c468b/install/bin/nvcc
ENV[CUDA_HOME]: /home/anandhu/CM/repos/local/cache/347380df9f6c468b/install

           ! cd /home/anandhu
           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda-devices/run.sh from tmp-run.sh
./tmp-run.sh: line 3: /home/anandhu/CM/repos/local/cache/0011e46a023746ae/berttest/bin/activate: No such file or directory
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Compiling program ...

Running program ...

/home/anandhu
           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda-devices/customize.py
GPU Device ID: 0
GPU Name: NVIDIA GeForce RTX 4090
GPU compute capability: 8.9
CUDA driver version: 12.2
CUDA runtime version: 12.4
Global memory: 25393692672
Max clock rate: 2520.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  1536
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535

    * cm run script "get generic-python-lib _package.dmiparser"
         ! load /home/anandhu/CM/repos/local/cache/3927fb01e4e34fde/cm-cached-state.json

    * cm run script "get cache dir _name.mlperf-inference-sut-descriptions"
         ! load /home/anandhu/CM/repos/local/cache/ad9c97dbf28c462a/cm-cached-state.json
Generating SUT description file for intel_spr_i9
         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-sut-description/customize.py

  * cm run script "install pip-package for-cmind-python _package.tabulate"
       ! load /home/anandhu/CM/repos/local/cache/066fa0e1608f4b34/cm-cached-state.json

  * cm run script "get mlperf inference utils"

    * cm run script "get mlperf inference src"
         ! load /home/anandhu/CM/repos/local/cache/08f829c532784225/cm-cached-state.json
         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-utils/customize.py
Using MLCommons Inference source from /home/anandhu/CM/repos/local/cache/aa85cd9ada244ffd/inference

Running loadgen scenario: Offline and mode: performance

* cm run script "build dockerfile"

Dockerfile generated at /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile

* cm run script "get docker"

  * cm run script "detect os"
         ! cd /home/anandhu/CM/repos/local/cache/23342dac54164d0b
         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py

    * /usr/bin/docker
           ! cd /home/anandhu/CM/repos/local/cache/23342dac54164d0b
           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-docker/run.sh from tmp-run.sh
           ! call "detect_version" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-docker/customize.py
    Detected version: 26.1.3

    # Found artifact in /usr/bin/docker
       ! cd /home/anandhu/CM/repos/local/cache/23342dac54164d0b
       ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-docker/run.sh from tmp-run.sh
       ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-docker/customize.py
    Detected version: 26.1.3

* cm run script "get mlperf inference results dir"
     ! load /home/anandhu/CM/repos/local/cache/f885f8230069430f/cm-cached-state.json

* cm run script "get mlperf inference submission dir"
       ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-submission-dir/customize.py

* cm run script "get ml-model gptj _nvidia _fp8"

  * cm run script "get git repo _repo.https://github.com/NVIDIA/TensorRT-LLM.git _sha.0ab9d17a59c284d2de36889832fe9fc7c8697604"

    * cm run script "detect os"
           ! cd /home/anandhu/CM/repos/local/cache/d1922606fb914e26
           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
         ! cd /home/anandhu/CM/repos/local/cache/d1922606fb914e26
         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-git-repo/run.sh from tmp-run.sh
******************************************************
Current directory: /home/anandhu/CM/repos/local/cache/d1922606fb914e26

Cloning TensorRT-LLM.git from https://github.com/NVIDIA/TensorRT-LLM.git

git clone  --recurse-submodules https://github.com/NVIDIA/TensorRT-LLM.git  repo

Cloning into 'repo'...
remote: Enumerating objects: 19137, done.
remote: Counting objects: 100% (8972/8972), done.
remote: Compressing objects: 100% (2436/2436), done.
remote: Total 19137 (delta 7153), reused 7602 (delta 6503), pack-reused 10165
Receiving objects: 100% (19137/19137), 285.26 MiB | 19.03 MiB/s, done.
Resolving deltas: 100% (13996/13996), done.
Updating files: 100% (2402/2402), done.
Filtering content: 100% (14/14), 212.15 MiB | 32.99 MiB/s, done.
Submodule '3rdparty/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path '3rdparty/NVTX'
Submodule '3rdparty/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path '3rdparty/cutlass'
Submodule '3rdparty/cxxopts' (https://github.com/jarro2783/cxxopts) registered for path '3rdparty/cxxopts'
Submodule '3rdparty/json' (https://github.com/nlohmann/json.git) registered for path '3rdparty/json'
Cloning into '/home/anandhu/CM/repos/local/cache/d1922606fb914e26/repo/3rdparty/NVTX'...
remote: Enumerating objects: 2424, done.
remote: Counting objects: 100% (770/770), done.
remote: Compressing objects: 100% (219/219), done.
remote: Total 2424 (delta 553), reused 638 (delta 508), pack-reused 1654
Receiving objects: 100% (2424/2424), 2.68 MiB | 9.33 MiB/s, done.
Resolving deltas: 100% (1374/1374), done.
Cloning into '/home/anandhu/CM/repos/local/cache/d1922606fb914e26/repo/3rdparty/cutlass'...
remote: Enumerating objects: 26714, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 26714 (delta 5), reused 10 (delta 0), pack-reused 26689
Receiving objects: 100% (26714/26714), 42.66 MiB | 14.04 MiB/s, done.
Resolving deltas: 100% (20054/20054), done.
Cloning into '/home/anandhu/CM/repos/local/cache/d1922606fb914e26/repo/3rdparty/cxxopts'...
remote: Enumerating objects: 1877, done.
remote: Counting objects: 100% (212/212), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 1877 (delta 186), reused 168 (delta 168), pack-reused 1665
Receiving objects: 100% (1877/1877), 691.80 KiB | 3.26 MiB/s, done.
Resolving deltas: 100% (1106/1106), done.
Cloning into '/home/anandhu/CM/repos/local/cache/d1922606fb914e26/repo/3rdparty/json'...
remote: Enumerating objects: 38219, done.
remote: Counting objects: 100% (101/101), done.
remote: Compressing objects: 100% (56/56), done.
remote: Total 38219 (delta 50), reused 73 (delta 33), pack-reused 38118
Receiving objects: 100% (38219/38219), 185.18 MiB | 18.15 MiB/s, done.
Resolving deltas: 100% (23471/23471), done.
Submodule path '3rdparty/NVTX': checked out 'a1ceb0677f67371ed29a2b1c022794f077db5fe7'
Submodule path '3rdparty/cutlass': checked out '7d49e6c7e2f8896c47f586706e67e1fb215529dc'
Submodule path '3rdparty/cxxopts': checked out 'eb787304d67ec22f7c3a184ee8b4c481d04357fd'
Submodule path '3rdparty/json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'

git checkout -b 0ab9d17a59c284d2de36889832fe9fc7c8697604 0ab9d17a59c284d2de36889832fe9fc7c8697604
Updating files: 100% (2713/2713), done.
Filtering content: 100% (4/4), 7.33 MiB | 4.33 MiB/s, done.
M       3rdparty/cutlass
Switched to a new branch '0ab9d17a59c284d2de36889832fe9fc7c8697604'
         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-git-repo/customize.py

CM cache path to the Git repo: /home/anandhu/CM/repos/local/cache/d1922606fb914e26/repo

  * cm run script "get cuda"
       ! load /home/anandhu/CM/repos/local/cache/113e9cb12a914b88/cm-cached-state.json

ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: yes
ENV[CM_CUDA_VERSION]: 12.4
ENV[CM_CUDA_VERSION_STRING]: cu124
ENV[CM_NVCC_BIN_WITH_PATH]: /home/anandhu/CM/repos/local/cache/347380df9f6c468b/install/bin/nvcc
ENV[CUDA_HOME]: /home/anandhu/CM/repos/local/cache/347380df9f6c468b/install

  * cm run script "get nvidia scratch space"
         ! cd /home/anandhu/CM/repos/local/cache/83dfa55dc9de45dc
         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-nvidia-scratch-space/run.sh from tmp-run.sh
         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-nvidia-scratch-space/customize.py

  * cm run script "get cuda-devices"

    * cm run script "get cuda _toolkit"
         ! load /home/anandhu/CM/repos/local/cache/042d9cdee6644854/cm-cached-state.json

ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no
ENV[CM_CUDA_VERSION]: 12.4
ENV[CM_CUDA_VERSION_STRING]: cu124
ENV[CM_NVCC_BIN_WITH_PATH]: /home/anandhu/CM/repos/local/cache/347380df9f6c468b/install/bin/nvcc
ENV[CUDA_HOME]: /home/anandhu/CM/repos/local/cache/347380df9f6c468b/install

         ! cd /home/anandhu/CM/repos/local/cache/2e5566ea2fc648d4
         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda-devices/run.sh from tmp-run.sh
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Compiling program ...

Running program ...

/home/anandhu/CM/repos/local/cache/2e5566ea2fc648d4
         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda-devices/customize.py
GPU Device ID: 0
GPU Name: NVIDIA GeForce RTX 4090
GPU compute capability: 8.9
CUDA driver version: 12.2
CUDA runtime version: 12.4
Global memory: 25393692672
Max clock rate: 2520.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  1536
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535

  * cm run script "get ml-model gpt-j _fp32 _pytorch"
       ! load /home/anandhu/CM/repos/local/cache/54a457e3e708400c/cm-cached-state.json

Path to the ML model: None

  * cm run script "get nvidia inference common-code"

    * cm run script "get mlperf inference results"
         ! load /home/anandhu/CM/repos/local/cache/f885f8230069430f/cm-cached-state.json
         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-nvidia-common-code/customize.py
Traceback (most recent call last):
  File "/home/anandhu/.local/bin/cm", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1490, in _run
    r = customize_code.preprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/run-mlperf-inference-app/customize.py", line 219, in preprocess
    r = cm.access(ii)
        ^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
           ^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4109, in docker
    return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/utils.py", line 1631, in call_internal_module
    return getattr(tmp_module, module_func)(i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module_misc.py", line 1817, in docker
    r = script_automation._run_deps(deps, [], env, {}, {}, {}, {}, '', [], '', False, '', verbose, show_time, ' ', run_state)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1380, in _run
    r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 2909, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1568, in _run
    r = prepare_and_run_script_with_postprocessing(run_script_input)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4733, in prepare_and_run_script_with_postprocessing
    rr = run_postprocess(customize_code, customize_common_input, recursion_spaces, env, state, const,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4785, in run_postprocess
    r = customize_code.postprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-nvidia-common-code/customize.py", line 16, in postprocess
    env['CM_MLPERF_INFERENCE_NVIDIA_CODE_PATH'] = os.path.join(env['CM_MLPERF_INFERENCE_RESULTS_PATH'], "closed", "NVIDIA")
                                                               ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'CM_MLPERF_INFERENCE_RESULTS_PATH'

mlcommons / cm4mlops

KeyError while running NVIDIA GPT-J implementation #90