CUDA version 12.4 not supported for this cm command

mlcommons / ck

Collective Mind (CM) is a small, modular, cross-platform and decentralized workflow automation framework with a human-friendly interface and reusable automation recipes to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data, software and hardware

Apache License 2.0

584 stars 109 forks source link

Running this command from the cm playground gives an error message: cm run script --tags=run-mlperf,inference,_performance-only,_short \ --division=open \ --category=edge \ --device=cuda \ --model=gptj-99 \ --precision=float32 \ --implementation=nvidia \ --backend=tensorrt \ --scenario=Offline \ --execution_mode=test \ --power=no \ --adr.python.version_min=3.8 \ --clean \ --compliance=no \ --quiet \ --time

It requires to install libnccl2=2.18.3 which only supports CUDA 11.0 and 12.0-2. I tried changing the version installed by ~/CM/repos/mlcommons@ck/cm-mlops/script/install-nccl-libs/ but ran into an other error later: Building CXX object caffe2/CMakeFiles/op_registration_test.dir/__/aten/src/ATen/core/op_registration/op_registration_test.cpp.o ninja: build stopped: subcommand failed.

I don't know if it is related to the change I made but haven't found a fix for this.

mlcommons / ck

CUDA version 12.4 not supported for this cm command #1243