Open ayanchak1508 opened 3 days ago
Hi @ayanchak1508 You can just remove the version requirement in this file locally which should be inside $HOME/repos/mlcommons@cm4mlops/script/
We never had success using a higher version of fbgemm with the available inference implementation. If you can share the exact versions which worked, we can test them.
Thanks for the quick reply! Yes, indeed after changing the version, it seems to be working.
These are the versions (that changed from the default) that work for me:
fbgemm_gpu==0.8.0+cpu
fbgemm_gpu-cpu==0.8.0
torch==2.4.0
torchrec==0.8.0
I have attached the full requirements.txt file in case if needed requirements.txt
I sometimes run into a bus error (core dumped)
error afterward, but it seems to be more of a memory capacity issue unrelated to the toolchain/benchmark?
Thanks a lot @ayanchak1508 . Let me check that. This issue might help with the bus error.
yes, with pytorch 2.4 we could use fbgemm_gpu==0.8.0 and it worked fine. We have removed the version dependency in the CM script now. You can just do cm pull repo
and it should be visible.
Just to add ulimit=9999
was not enough to run 1000 inputs. I think it'll be incredibly hard to do a full run of 204800 inputs using the current reference implementation on CPUs.
Thanks a lot for the quick updates!
I did a fresh, clean setup to see the effects. I have two observations:
pip
doesn't automatically know where to find fbgemm-gpu
for ARM, it needs to be installed via pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cpu/
ModuleNotFoundError: No module named 'fbgemm_gpu.split_embedding_configs'
)I'm not sure if I'm doing anything wrong, but if I create a new virtual environment and use the requirements file I posted earlier, the benchmark runs without problems. Maybe this is an ARM-specific problem?
Regarding the bus error
problem, thank you again for the references. Is there any way to use the debug dataset or limit the max inputs, i.e., deviate from the official submission rules in any way? (of course I understand it wouldn't count as a valid submission, but I'm just interested in the model performance)
I guess one possible solution could be to edit the conf file manually, but is there a better way? (Sorry for bringing the bus error into this issue, we can move it to a separate issue if needed)
For 1, may be the problem is with the .whl file?
"but if I create a new virtual environment and use the requirements file I posted earlier, the benchmark runs without problems."
Is it on the same ARM machine? If so, you can try the venv for CM flow also as follows:
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
For the bus error - what's the available RAM on the system?
Sorry, I should have been more specific. Runs are on a clean and empty docker container (ubuntu:22.04) on an ARM server.
I created two python venvs (in the same container), one for installing packages through the CM-based flow and one for installing packages from the requirements file. Although I didn't use the command you mentioned, I simply created a normal python venv as mentioned here: https://docs.mlcommons.org/inference/install/ and ran the CM commands for the benchmark there. Does the command you mentioned do something more?
For the bus error, the RAM is not too big, it's about ~250GB (the docker container has no resource constraints). I remember I faced a similar problem before when I processed the dataset myself some time back, and had to move to a different machine with 512 GB RAM. So, I understand maybe its not big enough to run the entire dataset, but should be fine at least for the debug dataset?
Thank you.
Yes, the commands are a bit different. CM
is a python package and when you use a venv for CM, it gets installed in the venv. Now when you run any workflow using CM, any available python on the system can be picked by the flow unless we force one using "cm run script - -tags=get,python" and doing the appropriate selection. The command I shared is a safer option as long as the name used is new.
Coming to 256GB, it should be good enough. We have run Dlrmv2 full comfortably on 192GB. It worked even on 64GB, but had to use a lot of swap space.
I believe your problem could be the shm size as docker is used. Are you explicitly setting shm size during docker run? We typically set 32GB shm size for dlrm.
Thank you very much for the clarification!
I did not set the shm size, and the default seems to be 64MB, much smaller than the 32GB you mentioned. I will try it out (both using the command you mentioned and increasing the shm size), and get back to you.
Thanks once again for all the quick help.
Sure @ayanchak1508 Just a correction to what I told earlier - the 64G system where we had run dlrmv2 was on GPUs and not CPUs. On CPUs we could only do a test run on 192G for 10 inputs.
Update:
ImportError: cannot import name 'DLRM_DCN' from 'torchrec.models.dlrm' (/root/CM/repos/local/cache/b1d060ef5c0c4217/mlperf/lib/python3.10/site-packages/torchrec/models/dlrm.py)
ModuleNotFoundError: No module named 'fbgemm_gpu.split_embedding_configs'
These are the packages it installs in the mlperf
venv: current.txt
Doing a diff with the requirements file I posted before, and then manually installing the correct package versions in the mlperf
venv solves the problem:
pip install torch==2.4.0 torchrec==0.8.0
pip uninstall fbgemm-gpu
pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cpu/
I am not sure why I had to reinstall the same version of fbgemm-gpu
but otherwise it runs into the ModuleNotFoundError
I was trying to run the DLRMv2 benchmark of MLPerf Inference on an ARM server using the instructions here.
I run into the issue when the tool tries to install
torchrec==0.3.2
torchrec==0.3.2
requiresfbgemm-gpu==0.3.2
butfbgemm-gpu
only introduced support for ARM starting from v0.5.0: https://download.pytorch.org/whl/cpu/fbgemm-gpu/I tried two alternate approaches:
fbgemm-gpu
(v0.5.0 or above) but thecm
tool remains inflexible and keeps trying to search for v0.3.2Previously, I did run the benchmark without any problems on ARM (without using the
cm
tool) using newer versions offbgemm-gpu
. (Note that I did need to usefbgemm-gpu-cpu
too)Command to reproduce the issue:
Error message:
The repro folder and the logfile is present in the attached tarball. cm-repro.tar.gz