mlcommons / cm4mlops

A collection of portable, reusable and cross-platform automation recipes (CM scripts) with a human-friendly interface and minimal dependencies to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware (cloud/edge)
http://docs.mlcommons.org/cm4mlops/
Apache License 2.0
7 stars 12 forks source link

CM fails to build DLRMv2 99 #92

Open WarrenSchultz opened 2 days ago

WarrenSchultz commented 2 days ago

Tried running both the command to run it via a docker container, and also running it within the ResNet50 container.

End of the log follows

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pypi.ngc.nvidia.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Collecting pyre_extensions
  Downloading pyre_extensions-0.0.30-py3-none-any.whl (12 kB)
Requirement already satisfied: typing-extensions in /home/cmuser/.local/lib/python3.8/site-packages (from pyre_extensions) (4.12.2)
Requirement already satisfied: typing-inspect in /home/cmuser/.local/lib/python3.8/site-packages (from pyre_extensions) (0.9.0)
Requirement already satisfied: mypy-extensions>=0.3.0 in /home/cmuser/.local/lib/python3.8/site-packages (from typing-inspect->pyre_extensions) (1.0.0)
Installing collected packages: pyre-extensions
Successfully installed pyre-extensions-0.0.30
             ! cd /home/cmuser/CM/repos/local/cache/60d83fede2d04cfd
             ! call /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/get-generic-python-lib/run.sh from tmp-run.sh
             ! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/get-generic-python-lib/customize.py
            Detected version: 0.0.30
Traceback (most recent call last):
  File "/home/cmuser/.local/bin/cm", line 8, in <module>
    sys.exit(run())
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1490, in _run
    r = customize_code.preprocess(ii)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/run-mlperf-inference-app/customize.py", line 219, in preprocess
    r = cm.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1553, in _run
    r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta,  env, state, const, const_state, add_deps_recursive,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 2909, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1380, in _run
    r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 2909, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 3080, in _run_deps
    r = self.cmind.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1490, in _run
    r = customize_code.preprocess(ii)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/get-preprocessed-dataset-criteo/customize.py", line 23, in preprocess
    output_dir = env['CM_DATASET_PREPROCESSED_PATH']
KeyError: 'CM_DATASET_PREPROCESSED_PATH'
arjunsuresh commented 2 days ago

DLRM docker container needs criteo dataset to be preprocessed outside of it. We need to add this option in the documentation page but if you have the preprocessed data we can tell you how to use it.

@anandhu-eng we can sync on how to add this option in the documentation page.

WarrenSchultz commented 2 days ago

Huh, ok. I thought I saw it pulling down the full dataset, but I may have been mistaken. I'm working on a lot in parallel at the moment. :) What's the correct command to do so at this point through CM?

arjunsuresh commented 2 days ago

Currently we only support plugging in the preprocessed data as the download of criteo stopped working without manual intervention. I believe we can share you the preprocessed data - doing preprocessing is heavy - needs 6.4 TB disk space and 600 GB+ of memory and around 3 days of running. The preprocessed data is less than 300 GB. We can share it by end of this week - needs to test it for expected accuracy.

WarrenSchultz commented 2 days ago

Great, thank you.