mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.18k stars 519 forks source link

CM error: no scripts were found with above tags and variations, follow the new docs site #1817

Open wlhtjht opened 1 month ago

wlhtjht commented 1 month ago

(python3-venv) aarch64_sh ~> cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1 --model=dlrm_v2-99 --implementation=reference --framework=pytorch --category=datacenter --scenario=Offline --execution_mode=test --device=cpu --quiet --test_query_count=50 INFO:root: cm run script "run-mlperf inference _find-performance _full _r4.1" INFO:root: cm run script "detect os" INFO:root: ! cd /home/ubuntu INFO:root: ! call /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: cm run script "detect cpu" INFO:root: cm run script "detect os" INFO:root: ! cd /home/ubuntu INFO:root: ! call /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: ! cd /home/ubuntu INFO:root: ! call /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py INFO:root: cm run script "get python3" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/a30274b4c59046f8/cm-cached-state.json INFO:root:Path to Python: /home/ubuntu/CM/repos/local/cache/8ff2b68847874923/mlperf/bin/python3 INFO:root:Python version: 3.10.12 INFO:root: cm run script "get mlcommons inference src" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/181aac323a064657/cm-cached-state.json INFO:root: cm run script "get sut description" INFO:root: cm run script "detect os" INFO:root: ! cd /home/ubuntu INFO:root: ! call /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: cm run script "detect cpu" INFO:root: cm run script "detect os" INFO:root: ! cd /home/ubuntu INFO:root: ! call /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: ! cd /home/ubuntu INFO:root: ! call /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py INFO:root: cm run script "get python3" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/a30274b4c59046f8/cm-cached-state.json INFO:root:Path to Python: /home/ubuntu/CM/repos/local/cache/8ff2b68847874923/mlperf/bin/python3 INFO:root:Python version: 3.10.12 INFO:root: cm run script "get compiler" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/ad4709d27e2746f6/cm-cached-state.json INFO:root: cm run script "get generic-python-lib _package.dmiparser" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/487bb3df259949b6/cm-cached-state.json INFO:root: cm run script "get cache dir _name.mlperf-inference-sut-descriptions" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/a1971a4c4e324cc2/cm-cached-state.json Generating SUT description file for cfe40b4a2122-pytorch INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-sut-description/customize.py INFO:root: cm run script "get mlperf inference results dir" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/5a5d8a736e15489b/cm-cached-state.json INFO:root: cm run script "install pip-package for-cmind-python _package.tabulate" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/ffdaabd53c414be8/cm-cached-state.json INFO:root: cm run script "get mlperf inference utils" INFO:root: cm run script "get mlperf inference src" INFO:root: ! load /home/ubuntu/CM/repos/local/cache/181aac323a064657/cm-cached-state.json INFO:root: ! call "postprocess" from /home/ubuntu/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-utils/customize.py Using MLCommons Inference source from /home/ubuntu/CM/repos/local/cache/0ab0359edada429b/inference

Running loadgen scenario: Offline and mode: performance INFO:root:* cm run script "app mlperf inference generic _reference _dlrm_v2-99 _pytorch _cpu _test _r4.1_default _offline"

CM error: no scripts were found with above tags and variations

variation tags ['reference', 'dlrm_v2-99', 'pytorch', 'cpu', 'test', 'r4.1_default', 'offline'] are not matching for the found script app-mlperf-inference with variations dictkeys(['cpp', 'mil', 'mlcommons-cpp', 'ctuning-cpp-tflite', 'tflite-cpp', 'reference', 'python', 'nvidia', 'mlcommons-python', 'reference,gptj', 'reference,sdxl', 'reference,dlrm-v2', 'reference,llama2-70b', 'reference,mixtral-8x7b', 'reference,resnet50', 'reference,retinanet', 'reference,bert', 'nvidia-original,r4.1-dev_default', 'nvidia-original,r4.1-devdefault,gptj', 'nvidia-original,r4.1_default', 'nvidia-original,r4.1default,gptj', 'nvidia-original,r4.1-devdefault,llama2-70b', 'nvidia-original,r4.1default,llama2-70b', 'nvidia-original', 'intel', 'intel-original', 'intel-original,gptj', 'redhat', 'qualcomm', 'kilt', 'kilt,qaic,resnet50', 'kilt,qaic,retinanet', 'kilt,qaic,bert-99', 'kilt,qaic,bert-99.9', 'intel-original,resnet50', 'intel-original,retinanet', 'intel-original,bert-99', 'intel-original,bert-99.9', 'intel-original,gptj-99', 'intel-original,gptj-99.9', 'resnet50', 'retinanet', '3d-unet-99', '3d-unet-99.9', '3d-unet', 'sdxl', 'llama2-70b', 'llama2-70b-99', 'llama2-70b-99.9', 'mixtral-8x7b', 'rnnt', 'rnnt,reference', 'gptj-99', 'gptj-99.9', 'gptj', 'gptj', 'bert', 'bert-99', 'bert-99.9', 'dlrm', 'dlrm-v2-99', 'dlrm-v2-99.9', 'dlrm_,nvidia', 'mobilenet', 'efficientnet', 'onnxruntime', 'tensorrt', 'tf', 'pytorch', 'openshift', 'ncnn', 'deepsparse', 'tflite', 'glow', 'tvm-onnx', 'tvm-pytorch', 'tvm-tflite', 'ray', 'cpu', 'cuda,reference', 'cuda', 'rocm', 'qaic', 'tpu', 'fast', 'test', 'valid,retinanet', 'valid', 'quantized', 'fp32', 'float32', 'float16', 'bfloat16', 'int4', 'int8', 'uint8', 'offline', 'multistream', 'singlestream', 'server', 'power', 'batch_size.#', 'r2.1_default', 'r3.0_default', 'r3.1_default', 'r4.0-dev_default', 'r4.0_default', 'r4.1-dev_default', 'r4.1_default']) !

arjunsuresh commented 1 month ago

There is a typo in the docs website - it should be dlrm-v2-99. But dlrm-v2 reference implementation is tested only on 8x80GB Nvidia GPUs. Would you like to try Intel or Nvidia implementation instead?

wlhtjht commented 1 month ago

I prefer to do this on the Arm CPU. Of course, I will also try Intel and NV. The cmd changed to "dlrm-v2-99 " is still not ok, But intel is ok, at least no CM error. I will try manually, Thank you.

arjunsuresh commented 1 month ago

Oh okay. The reference implementation probably won't work out of the box on CPUs (it failed for us).

arjunsuresh commented 1 month ago

Meanwhile we are working on the script to update the dlrm scripts to use the MLCommons hosted criteo preprocessed dataset - should be ready by tomorrow. Without this, the current scripts are broken as criteo dataset needs to be downloaded manually.

howudodat commented 1 month ago

curious if the script has been updated? We are eager to test

I downloaded the weights and data manually, but I must still be missing something:

cm run script --tags=get,ml-model,dlrm,_pytorch,_weight_sharded,_rclone -j
cm run script --tags=get,preprocessed,dataset,criteo,_multihot,_mlc  -j
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1
    --model=dlrm-v2-99
    --implementation=reference
    --framework=pytorch
    --category=datacenter
    --scenario=Offline
    --execution_mode=test
    --device=cpu
     --docker
    --quiet
    --test_query_count=50

results in: KeyError: 'CM_DATASET_PREPROCESSED_PATH'

arjunsuresh commented 1 month ago

@howudodat yes, the preprocessing script is now working. The reference scripts needed many changes to get it working. Please do

cm pull repo
cm run script --tags=run-mlperf,inference,_full --model=dlrm-v2-99 --backend=pytorch --quiet --test_query_count=1000 --docker

In the second command you can avoid --docker if you want to run on the host machine.

It is still in testing so there can be issues. The machine to run will need about 500GB of memory. We are testing on 128GB with 500GB swap.

arjunsuresh commented 1 month ago

If all goes well you should be seeing

 ./run_local.sh pytorch dlrm multihot-criteo cpu --scenario Offline    --mlperf_conf '/home/cmuser/CM/repos/local/cache/7aecc037606c4784/inference/mlperf.conf'  --max-ind-range=40000000  --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt  --user_conf '/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/7b425bbfd8e34c99ae1dfdf49a5e8902.conf' 2>&1 ; echo $? > exitstatus | tee '/home/cmuser/CM/repos/local/cache/28c9ba49c42b4bd6/test_results/79971f578f12-reference-cpu-pytorch-v1.13.1-default_config/dlrm-v2-99/offline/performance/run_1/console.out'
+ python python/main.py --profile dlrm-multihot-pytorch --mlperf_conf ../../../mlperf.conf --model dlrm --model-path /home/cmuser/CM/repos/local/cache/1faad90b75984cb0/model_weights --dataset multihot-criteo --dataset-path /home/cmuser/CM/repos/local/cache/6e1079b9161e4ac2/dlrm_preprocessed --output /home/cmuser/CM/repos/local/cache/7aecc037606c4784/inference/recommendation/dlrm_v2/pytorch/output/pytorch-cpu/dlrm --scenario Offline --mlperf_conf /home/cmuser/CM/repos/local/cache/7aecc037606c4784/inference/mlperf.conf --max-ind-range=40000000 --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt --user_conf /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/7b425bbfd8e34c99ae1dfdf49a5e8902.conf
/home/cmuser/.local/lib/python3.10/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorEN3c108optionalINS5_10ScalarTypeEEENS6_INS5_6LayoutEEENS6_INS5_6DeviceEEENS6_IbEENS6_INS5_12MemoryFormatEEE
INFO:main:Namespace(model='dlrm', model_path='/home/cmuser/CM/repos/local/cache/1faad90b75984cb0/model_weights', dataset='multihot-criteo', dataset_path='/home/cmuser/CM/repos/local/cache/6e1079b9161e4ac2/dlrm_preprocessed', profile='dlrm-multihot-pytorch', scenario='Offline', max_ind_range=40000000, max_batchsize=2048, output='/home/cmuser/CM/repos/local/cache/7aecc037606c4784/inference/recommendation/dlrm_v2/pytorch/output/pytorch-cpu/dlrm', inputs=['continuous and categorical features'], outputs=['probability'], backend='pytorch-native', use_gpu=False, threads=32, accuracy=False, find_peak_performance=False, mlperf_conf='/home/cmuser/CM/repos/local/cache/7aecc037606c4784/inference/mlperf.conf', user_conf='/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/7b425bbfd8e34c99ae1dfdf49a5e8902.conf', duration=None, target_qps=None, max_latency=None, count_samples=None, count_queries=None, samples_per_query_multistream=8, samples_per_query_offline=2048, samples_to_aggregate_fix=None, samples_to_aggregate_min=None, samples_to_aggregate_max=None, samples_to_aggregate_quantile_file='./tools/dist_quantile.txt', samples_to_aggregate_trace_file='dlrm_trace_of_aggregated_samples.txt', numpy_rand_seed=123, debug=False)
Using CPU...
Using variable query size: custom distribution (file ./tools/dist_quantile.txt)
Loading model from /home/cmuser/CM/repos/local/cache/1faad90b75984cb0/model_weights
Initializing embeddings...
Initializing model...
Distributing the model...
WARNING:root:Could not determine LOCAL_WORLD_SIZE from environment, falling back to WORLD_SIZE.
INFO:torchrec.distributed.planner.proposers:Skipping grid search proposer as there are too many proposals.
Total proposals to search: 4.50e+15
Max proposals allowed: 10000

INFO:torchrec.distributed.planner.stats:###############################################################################################################################################################################################################
INFO:torchrec.distributed.planner.stats:#                                                                                         --- Planner Statistics ---                                                                                          #
INFO:torchrec.distributed.planner.stats:#                                                                 --- Evaluated 81 proposal(s), found 81 possible plan(s), ran for 0.04s ---                                                                  #
INFO:torchrec.distributed.planner.stats:# ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- #
INFO:torchrec.distributed.planner.stats:#      Rank     HBM (GB)     DDR (GB)     Perf (ms)     Input (MB)     Output (MB)     Shards                                                                                                                 #
INFO:torchrec.distributed.planner.stats:#    ------   ----------   ----------   -----------   ------------   -------------   --------                                                                                                                 #
INFO:torchrec.distributed.planner.stats:#         0     0.0 (0%)   97.7 (76%)         0.774            0.1             6.5     TW: 26                                                                                                                 #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Input: MB/iteration, Output: MB/iteration, Shards: number of tables                                                                                                                                         #
INFO:torchrec.distributed.planner.stats:# HBM: estimated peak memory usage for shards, dense tensors, and features (KJT)                                                                                                                              #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Parameter Info:                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:#                                                    FQN     Sharding     Compute Kernel     Perf (ms)     Pooling Factor     Number of Poolings     Output     Features    Emb Dim     Hash Size     Ranks   #
INFO:torchrec.distributed.planner.stats:#                                                  -----   ----------   ----------------   -----------   ----------------   --------------------   --------   ----------   --------   -----------   -------   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_0           TW              fused          0.03                1.0                    1.0     pooled            1        128      40000000         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_1           TW              fused          0.03                1.0                    1.0     pooled            1        128         39060         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_2           TW              fused          0.03                1.0                    1.0     pooled            1        128         17295         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_3           TW              fused          0.03                1.0                    1.0     pooled            1        128          7424         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_4           TW              fused          0.03                1.0                    1.0     pooled            1        128         20265         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_5           TW              fused          0.03                1.0                    1.0     pooled            1        128             3         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_6           TW              fused          0.03                1.0                    1.0     pooled            1        128          7122         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_7           TW              fused          0.03                1.0                    1.0     pooled            1        128          1543         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_8           TW              fused          0.03                1.0                    1.0     pooled            1        128            63         0   #
INFO:torchrec.distributed.planner.stats:#     model.sparse_arch.embedding_bag_collection.t_cat_9           TW              fused          0.03                1.0                    1.0     pooled            1        128      40000000         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_10           TW              fused          0.03                1.0                    1.0     pooled            1        128       3067956         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_11           TW              fused          0.03                1.0                    1.0     pooled            1        128        405282         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_12           TW              fused          0.03                1.0                    1.0     pooled            1        128            10         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_13           TW              fused          0.03                1.0                    1.0     pooled            1        128          2209         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_14           TW              fused          0.03                1.0                    1.0     pooled            1        128         11938         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_15           TW              fused          0.03                1.0                    1.0     pooled            1        128           155         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_16           TW              fused          0.03                1.0                    1.0     pooled            1        128             4         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_17           TW              fused          0.03                1.0                    1.0     pooled            1        128           976         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_18           TW              fused          0.03                1.0                    1.0     pooled            1        128            14         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_19           TW              fused          0.03                1.0                    1.0     pooled            1        128      40000000         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_20           TW              fused          0.03                1.0                    1.0     pooled            1        128      40000000         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_21           TW              fused          0.03                1.0                    1.0     pooled            1        128      40000000         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_22           TW              fused          0.03                1.0                    1.0     pooled            1        128        590152         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_23           TW              fused          0.03                1.0                    1.0     pooled            1        128         12973         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_24           TW              fused          0.03                1.0                    1.0     pooled            1        128           108         0   #
INFO:torchrec.distributed.planner.stats:#    model.sparse_arch.embedding_bag_collection.t_cat_25           TW              fused          0.03                1.0                    1.0     pooled            1        128            36         0   #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Batch Size: 512                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Compute Kernels:                                                                                                                                                                                            #
INFO:torchrec.distributed.planner.stats:#    fused: 26                                                                                                                                                                                                #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Longest Critical Path: 0.774 ms on rank 0                                                                                                                                                                   #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Peak Memory Pressure: 0.0 GB on rank 0                                                                                                                                                                      #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Usable Memory:                                                                                                                                                                                              #
INFO:torchrec.distributed.planner.stats:#    HBM: 0.0 GB, DDR: 128.0 GB                                                                                                                                                                               #
INFO:torchrec.distributed.planner.stats:#    Percent of Total HBM: 95%                                                                                                                                                                                #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# Dense Storage (per rank):                                                                                                                                                                                   #
INFO:torchrec.distributed.planner.stats:#    HBM: 0.0 GB, DDR: 0.359 GB                                                                                                                                                                               #
INFO:torchrec.distributed.planner.stats:#                                                                                                                                                                                                             #
INFO:torchrec.distributed.planner.stats:# KJT Storage (per rank):                                                                                                                                                                                     #
INFO:torchrec.distributed.planner.stats:#    HBM: 0.0 GB, DDR: 0.002 GB                                                                                                                                                                               #
INFO:torchrec.distributed.planner.stats:###############################################################################################################################################################################################################
INFO:root:Using fused exact_sgd with optimizer_args=OptimizerArgs(stochastic_rounding=True, gradient_clipping=False, max_gradient=1.0, learning_rate=0.01, eps=1e-08, beta1=0.9, beta2=0.999, weight_decay=0.0, weight_decay_mode=0, eta=0.001, momentum=0.9)
Loading model weights...
INFO:torchsnapshot.scheduler:Set process memory budget to 15829337702 bytes.
INFO:torchsnapshot.scheduler:Rank 0 finished loading. Throughput: 4157.55MB/s
howudodat commented 4 weeks ago

I am testing. one small issue is a pre-req was missing: unzip

/home/peter/CM/repos/mlcommons@cm4mlops/script/get-rclone/install.sh: line 9: unzip: command not found

a simple apt install unzip and it progresses

arjunsuresh commented 4 weeks ago

We'll fix the issue for unzip. Thanks for letting us know. Please let us know if there is any other issue with the run.

howudodat commented 4 weeks ago

I dont have 128G of memory to use, but I am not getting to the above point

CMD:  ./run_local.sh pytorch dlrm multihot-criteo cpu --scenario Offline    --mlperf_conf '/home/cmuser/CM/repos/local/cache/a14eb481b8b24a9f/inference/mlperf.conf'  --max-ind-range=40000000  --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt  --user_conf '/home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/d30dabe774784ae9b118c18ee413dc16.conf' 2>&1 ; echo \$? > exitstatus | tee '/home/cmuser/CM/repos/local/cache/d339e8c618184c86/test_results/f151395b64b0-reference-cpu-pytorch-v1.13.1-default_config/dlrm-v2-99/offline/performance/run_1/console.out'

DEBUG:root:    - Running native script "/home/cmuser/CM/repos/mlcommons@cm4mlops/script/benchmark-program/run-ubuntu.sh" from temporal script "tmp-run.sh" in "/home/cmuser" ...
INFO:root:         ! cd /home/cmuser
INFO:root:         ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/benchmark-program/run-ubuntu.sh from tmp-run.sh

 ./run_local.sh pytorch dlrm multihot-criteo cpu --scenario Offline    --mlperf_conf '/home/cmuser/CM/repos/local/cache/a14eb481b8b24a9f/inference/mlperf.conf'  --max-ind-range=40000000  --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt  --user_conf '/home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/d30dabe774784ae9b118c18ee413dc16.conf' 2>&1 ; echo $? > exitstatus | tee '/home/cmuser/CM/repos/local/cache/d339e8c618184c86/test_results/f151395b64b0-reference-cpu-pytorch-v1.13.1-default_config/dlrm-v2-99/offline/performance/run_1/console.out'
+ python python/main.py --profile dlrm-multihot-pytorch --mlperf_conf ../../../mlperf.conf --model dlrm --model-path /home/cmuser/CM/repos/local/cache/2f38d7a46ac74efe/model_weights --dataset multihot-criteo --dataset-path /home/cmuser/CM/repos/local/cache/855ac16cc352468a/dlrm_preprocessed --output /home/cmuser/CM/repos/local/cache/a14eb481b8b24a9f/inference/recommendation/dlrm_v2/pytorch/output/pytorch-cpu/dlrm --scenario Offline --mlperf_conf /home/cmuser/CM/repos/local/cache/a14eb481b8b24a9f/inference/mlperf.conf --max-ind-range=40000000 --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt --user_conf /home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/d30dabe774784ae9b118c18ee413dc16.conf
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppc12zh80
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppc12zh80/_remote_module_non_scriptable.py
./run_local.sh: line 14:   494 Illegal instruction     (core dumped) python python/main.py --profile $profile $common_opt --model $model --model-path $model_path --dataset $dataset --dataset-path $DATA_DIR --output $OUTPUT_DIR $EXTRA_OPS $@
./run.sh: line 59: 132: command not found
./run.sh: line 65: 132: command not found

CM error: Portable CM script failed (name = benchmark-program, return code = 32512)
arjunsuresh commented 4 weeks ago

Illegal instruction might point to an issue with ARM architecture. We have never tried the reference implementation on aarch64.

DLRM_v2 is a datacenter only benchmark - the model itself is 98GB in size. So trying it on a 32GB system will be really hard even on x86.