mlperf inference gpt-j nvidia implementation fails

mlosab3 commented 1 month ago

I was following this doc to run gpt-j with nvidia implementation but it fails when running this command:

cm run script --tags=run-mlperf,inference,_find-performance,_full \
   --model=gptj-99 \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=50

FAIL:

Path to the ML model: None

  * cm run script "get nvidia inference common-code"

    * cm run script "get mlperf inference results"
         ! load /home/ubuntu/CM/repos/local/cache/5f3a720e169046ea/cm-cached-state.json
         ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-nvidia-common-code/customize.py
Traceback (most recent call last):
  File "/home/ubuntu/cm/bin/cm", line 33, in <module>
    sys.exit(load_entry_point('cmind==2.3.3', 'console_scripts', 'cm')())
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1490, in _run
    r = customize_code.preprocess(ii)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/script/run-mlperf-inference-app/customize.py", line 219, in preprocess
    r = cm.access(ii)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4109, in docker
    return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/utils.py", line 1631, in call_internal_module
    return getattr(tmp_module, module_func)(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module_misc.py", line 1817, in docker
    r = script_automation._run_deps(deps, [], env, {}, {}, {}, {}, '', [], '', False, '', verbose, show_time, ' ', run_state)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1380, in _run
    r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 2909, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1568, in _run
    r = prepare_and_run_script_with_postprocessing(run_script_input)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4737, in prepare_and_run_script_with_postprocessing
    rr = run_postprocess(customize_code, customize_common_input, recursion_spaces, env, state, const,
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4789, in run_postprocess
    r = customize_code.postprocess(ii)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-nvidia-common-code/customize.py", line 16, in postprocess
    env['CM_MLPERF_INFERENCE_NVIDIA_CODE_PATH'] = os.path.join(env['CM_MLPERF_INFERENCE_RESULTS_PATH'], "closed", "NVIDIA")
KeyError: 'CM_MLPERF_INFERENCE_RESULTS_PATH'

Any suggestions how to fix this problem? Thank you.

arjunsuresh commented 1 month ago

Can you please do cm pull repo and retry the same command?

mlosab3 commented 1 month ago

Still the same error:

(cm) ubuntu:~$ cm pull repo
=======================================================
Alias:    mlcommons@cm4mlops

Local path: /home/ubuntu/CM/repos/mlcommons@cm4mlops

git pull

Already up to date.

CM alias for this repository: mlcommons@cm4mlops
=======================================================

Reindexing all CM artifacts. Can take some time ...
Took 0.6 sec.

and then

Path to the ML model: None

  * cm run script "get nvidia inference common-code"

    * cm run script "get mlperf inference results"
         ! load /home/ubuntu/CM/repos/local/cache/dfe6100df56d428a/cm-cached-state.json
         ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-nvidia-common-code/customize.py
Traceback (most recent call last):
  File "/home/ubuntu/cm/bin/cm", line 33, in <module>
    sys.exit(load_entry_point('cmind==2.3.3', 'console_scripts', 'cm')())
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1490, in _run
    r = customize_code.preprocess(ii)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/script/run-mlperf-inference-app/customize.py", line 219, in preprocess
    r = cm.access(ii)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4109, in docker
    return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/utils.py", line 1631, in call_internal_module
    return getattr(tmp_module, module_func)(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module_misc.py", line 1817, in docker
    r = script_automation._run_deps(deps, [], env, {}, {}, {}, {}, '', [], '', False, '', verbose, show_time, ' ', run_state)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1380, in _run
    r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 2909, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
  File "/home/ubuntu/cm/lib/python3.10/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1568, in _run
    r = prepare_and_run_script_with_postprocessing(run_script_input)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4737, in prepare_and_run_script_with_postprocessing
    rr = run_postprocess(customize_code, customize_common_input, recursion_spaces, env, state, const,
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4789, in run_postprocess
    r = customize_code.postprocess(ii)
  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-nvidia-common-code/customize.py", line 16, in postprocess
    env['CM_MLPERF_INFERENCE_NVIDIA_CODE_PATH'] = os.path.join(env['CM_MLPERF_INFERENCE_RESULTS_PATH'], "closed", "NVIDIA")
KeyError: 'CM_MLPERF_INFERENCE_RESULTS_PATH'

arjunsuresh commented 1 month ago

Are you on the master branch of cm4mlops repository? If so, please do

cd $HOME/CM/repos/mlcommons@cm4mlops && git checkout mlperf-inference

mlosab3 commented 1 month ago

Are you on the master branch of cm4mlops repository? If so, please do

cd $HOME/CM/repos/mlcommons@cm4mlops && git checkout mlperf-inference

No, it seems that I am already on the mlperf-inference branch:

ubuntu@r:~$ cd $HOME/CM/repos/mlcommons@cm4mlops && git checkout mlperf-inference
Already on 'mlperf-inference'
Your branch is up to date with 'origin/mlperf-inference'.
ubuntu@r:~/CM/repos/mlcommons@cm4mlops$ git pull
Already up to date.

I have tried git pull. I then re-run the cm command but it still fails with the same error.

arjunsuresh commented 1 month ago

oh. strangely git pull shows "up to date". Can you please share the output of git log ? Also cm rm cache -f can help if there are some stale cache entries.

mlosab3 commented 1 month ago

Sure. Here are few lines of the output of git log:

(cm) ubuntu@r:~/CM/repos/mlcommons@cm4mlops$ git log
commit 29aa072edcb5bc06643d3d53bdaac9dedc735eed (HEAD -> mlperf-inference, origin/mlperf-inference)
Merge: a4067b4be 1c8f6a8e5
Author: Arjun Suresh <arjunsuresh1987@gmail.com>
Date:   Tue Jul 9 18:54:40 2024 +0100

    Merge pull request #102 from GATEOverflow/mlperf-inference

    Merge from go

commit 1c8f6a8e5b2633bd29b782c986efe9d6d142225e
Author: Arjun Suresh <arjunsuresh1987@gmail.com>
Date:   Tue Jul 9 13:27:17 2024 +0000

    Add Intel mlperf inference stable diffusion (WIP)

commit f2aa787efbc99d5c16fa61eca399fd8cc833d825
Author: Arjun Suresh <arjunsuresh1987@gmail.com>
Date:   Mon Jul 8 12:39:32 2024 +0000

    Fixes for mlperf inference intel dlrmv2 (run starts)

I have tried to run cm rm cache -f. I then re-run the same command and still get the same error:

  File "/home/ubuntu/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-nvidia-common-code/customize.py", line 16, in postprocess
    env['CM_MLPERF_INFERENCE_NVIDIA_CODE_PATH'] = os.path.join(env['CM_MLPERF_INFERENCE_RESULTS_PATH'], "closed", "NVIDIA")
KeyError: 'CM_MLPERF_INFERENCE_RESULTS_PATH'

arjunsuresh commented 1 month ago

Can you please try cm pull repo and retry the command once more? This change should hopefully fix the issue for you.

mlosab3 commented 1 month ago

Can you please try cm pull repo and retry the command once more? This change should hopefully fix the issue for you.

That helped; thanks! However, after that, I am hitting another problem (I tried to run cm twice and has the same error):

Downloading flatbuffers-24.3.25-py2.py3-none-any.whl (26 kB)
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... done
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=124d1054ce4a1a57d830f1c82f92fc4845a0f235bbb719c14003a448c76fcbc3
  Stored in directory: /tmp/pip-ephem-wheel-cache-et58ul6e/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: sentencepiece, flatbuffers, distlib, xxhash, virtualenv, tqdm, safetensors, requests, pyproject_hooks, pynvml, pybind11-stubgen, pyarrow, py, pbr, parameterized, nodeenv, mypy-extensions, lark, janus, identify, humanfriendly, graphviz, dill, coverage, colored, cfgv, stevedore, pytest-forked, pre-commit, onnx-graphsurgeon, nltk, mypy, multiprocess, huggingface-hub, coloredlogs, build, tokenizers, rouge_score, pytest-cov, onnxruntime, diffusers, bandit, accelerate, transformers, datasets, nvidia-ammo, evaluate, optimum
  Attempting uninstall: tqdm
    Found existing installation: tqdm 4.66.1
    Uninstalling tqdm-4.66.1:
      Successfully uninstalled tqdm-4.66.1
  Attempting uninstall: requests
    Found existing installation: requests 2.31.0
    Uninstalling requests-2.31.0:
      Successfully uninstalled requests-2.31.0
  Attempting uninstall: pynvml
    Found existing installation: pynvml 11.4.1
    Uninstalling pynvml-11.4.1:
      Successfully uninstalled pynvml-11.4.1
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 12.0.1
    Uninstalling pyarrow-12.0.1:
      Successfully uninstalled pyarrow-12.0.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dask-cuda 23.10.0 requires pynvml<11.5,>=11.0.0, but you have pynvml 11.5.0 which is incompatible.
Successfully installed accelerate-0.25.0 bandit-1.7.7 build-1.2.1 cfgv-3.4.0 colored-2.2.4 coloredlogs-15.0.1 coverage-7.5.4 datasets-2.20.0 diffusers-0.15.0 dill-0.3.8 distlib-0.3.8 evaluate-0.4.2 flatbuffers-24.3.25 graphviz-0.20.3 huggingface-hub-0.23.4 humanfriendly-10.0 identify-2.6.0 janus-1.0.0 lark-1.1.9 multiprocess-0.70.16 mypy-1.10.1 mypy-extensions-1.0.0 nltk-3.8.1 nodeenv-1.9.1 nvidia-ammo-0.7.4 onnx-graphsurgeon-0.5.2 onnxruntime-1.16.3 optimum-1.21.2 parameterized-0.9.0 pbr-6.0.0 pre-commit-3.7.1 py-1.11.0 pyarrow-16.1.0 pybind11-stubgen-2.5.1 pynvml-11.5.0 pyproject_hooks-1.1.0 pytest-cov-5.0.0 pytest-forked-1.6.0 requests-2.32.3 rouge_score-0.1.2 safetensors-0.4.3 sentencepiece-0.2.0 stevedore-5.2.0 tokenizers-0.15.2 tqdm-4.66.4 transformers-4.36.1 virtualenv-20.26.3 xxhash-3.4.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 23.3.1 -> 24.1.2
[notice] To update, run: python3 -m pip install --upgrade pip
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- NVTX is disabled
-- Importing batch manager
-- Building PyTorch
-- Building Google tests
-- Building benchmarks
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- CUDA compiler: /usr/local/cuda/bin/nvcc
-- GPU architectures: 89
-- The C compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 12.3.107
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.107")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CUDA library status:
--     version: 12.3.107
--     libraries: /usr/local/cuda/lib64
--     include path: /usr/local/cuda/targets/x86_64-linux/include
-- ========================= Importing and creating target nvinfer ==========================
-- Looking for library nvinfer
-- Library that was found /usr/local/tensorrt/targets/x86_64-linux-gnu/lib/libnvinfer.so
-- ==========================================================================================
-- CUDAToolkit_VERSION 12.3 is greater or equal than 11.0, enable -DENABLE_BF16 flag
-- CUDAToolkit_VERSION 12.3 is greater or equal than 11.8, enable -DENABLE_FP8 flag
-- Found MPI_C: /opt/hpcx/ompi/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /opt/hpcx/ompi/lib/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- COMMON_HEADER_DIRS: /code/tensorrt_llm/cpp;/usr/local/cuda/include
-- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter Development Development.Module Development.Embed
-- USE_CXX11_ABI is set by python Torch to 1
-- TORCH_CUDA_ARCH_LIST: 8.9+PTX
CMake Warning at CMakeLists.txt:295 (message):
  Ignoring environment variable TORCH_CUDA_ARCH_LIST=5.2 6.0 6.1 7.0 7.2 7.5
  8.0 8.6 8.7 9.0+PTX

-- Found Python executable at /usr/bin/python3.10
-- Found Python libraries at /usr/lib/x86_64-linux-gnu
-- Found CUDA: /usr/local/cuda (found version "12.3")
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.107")
-- Caffe2: CUDA detected: 12.3
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 12.3
-- /usr/local/cuda-12.3/targets/x86_64-linux/lib/libnvrtc.so shorthash is e150bf88
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_89,code=compute_89
CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:328 (find_package)

-- Found Torch: /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch.so
-- TORCH_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=1
-- Building for TensorRT version: 9.2.0, library version: 9
-- Using MPI_C_INCLUDE_DIRS: /opt/hpcx/ompi/include;/opt/hpcx/ompi/include/openmpi;/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include;/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent;/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include
-- Using MPI_C_LIBRARIES: /opt/hpcx/ompi/lib/libmpi.so
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Operating System: ubuntu, 22.04
CMake Error at tensorrt_llm/CMakeLists.txt:101 (message):
  The batch manager library is truncated or incomplete.  This is usually
  caused by using Git LFS (Large File Storage) incorrectly.  Please try
  running command `git lfs install && git lfs pull`.

-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
  File "/code/tensorrt_llm/scripts/build_wheel.py", line 319, in <module>
    main(**vars(args))
  File "/code/tensorrt_llm/scripts/build_wheel.py", line 160, in main
    build_run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake -DCMAKE_BUILD_TYPE="Release" -DBUILD_PYT="ON" -DBUILD_PYBIND="ON" "-DCMAKE_CUDA_ARCHITECTURES=89" -DTRT_LIB_DIR=/usr/local/tensorrt//targets/x86_64-linux-gnu/lib -DTRT_INCLUDE_DIR=/usr/local/tensorrt//include  -S "/code/tensorrt_llm/cpp"' returned non-zero exit status 1.
make: *** [Makefile:102: devel_run] Error 1
make: Leaving directory '/root/CM/repos/local/cache/243ebcfde713428e/repo/docker'

CM error: Portable CM script failed (name = get-ml-model-gptj, return code = 256)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!

arjunsuresh commented 1 month ago

'git-lfs' seems missing. Even though the run happens inside docker the model download is done on the host as it needs quantization using a separate docker container. I have added a change to install this in CM. Can you please redo cm pull repo ?

mlosab3 commented 1 month ago

Yes, git-lfs solved the issue. Thank you. However, I just ran out of CUDA memory and can't check it further. I will try to find another GPU:

Calibrating batch 511
Quantization done. Total time used: 387.46 s.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The AMMO optimized model state_dict (including the quantization factors) is saved to /mnt/models/GPTJ-6B/fp8-quantized-ammo/GPTJ-FP8-quantized/ammo_model.0.pth using torch.save for further inspection.
Detailed export error: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacity of 21.95 GiB of which 28.12 MiB is free. Process 8143 has 21.92 GiB memory in use. Of the allocated memory 21.67 GiB is allocated by PyTorch, and 32.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/model_config_export.py", line 307, in export_model_config
    for model_config in torch_to_model_config(
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/model_config_export.py", line 185, in torch_to_model_config
    build_decoder_config(layer, model_metadata_config, decoder_type, dtype)
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/layer_utils.py", line 944, in build_decoder_config
    config.mlp = build_mlp_config(layer, decoder_type, dtype)
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/layer_utils.py", line 764, in build_mlp_config
    config.fc = build_linear_config(layer, LINEAR_COLUMN, dtype)
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/layer_utils.py", line 591, in build_linear_config
    weight = torch_weight.type(dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacity of 21.95 GiB of which 28.12 MiB is free. Process 8143 has 21.92 GiB memory in use. Of the allocated memory 21.67 GiB is allocated by PyTorch, and 32.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Quantized model exported to /mnt/models/GPTJ-6B/fp8-quantized-ammo/GPTJ-FP8-quantized
Total time used 35.41 s.
make: Leaving directory '/root/CM/repos/local/cache/f50b4df1b7424441/repo/docker'
/home/ubuntu/cm/bin/python3: can't open file '/root/CM/repos/local/cache/55918655d7a54368/repo/closed/NVIDIA/code/gptj/tensorrt/onnx_tune.py': [Errno 2] No such file or directory

CM error: Portable CM script failed (name = get-ml-model-gptj, return code = 256)

mlcommons / cm4mlops

mlperf inference gpt-j nvidia implementation fails #101