when I excude this command:
./run_local.sh pytorch dlrm terabyte gpu --scenario Server --max-ind-range=40000000 --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt
then:
Using 8 GPU(s)...
Reading pre-processed data=/home/user01/file/node05/joyecai/nfs_share/dlrm_raw_dataset/terabyte_processed.npz
Sparse features= 26, Dense features= 13
Using variable query size: custom distribution (file ./tools/dist_quantile.txt)
Traceback (most recent call last):
File "/home/user01/file/node05/joyecai/code/mlcommons/inference/recommendation/dlrm/pytorch/python/main.py", line 619, in main()
File "/home/user01/file/node05/joyecai/code/mlcommons/inference/recommendation/dlrm/pytorch/python/main.py", line 503, in main
model = backend.load(args.model_path, inputs=args.inputs, outputs=args.outputs)
File "/home/user01/file/node05/joyecai/code/mlcommons/inference/recommendation/dlrm/pytorch/python/backend_pytorch_native.py", line 61, in load
dlrm = dlrm.to(self.device) # .cuda()
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.91 GiB (GPU 0; 79.18 GiB total capacity; 70.34 GiB already allocated; 8.00 GiB free; 70.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
when I excude this command: ./run_local.sh pytorch dlrm terabyte gpu --scenario Server --max-ind-range=40000000 --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt
then: Using 8 GPU(s)... Reading pre-processed data=/home/user01/file/node05/joyecai/nfs_share/dlrm_raw_dataset/terabyte_processed.npz Sparse features= 26, Dense features= 13 Using variable query size: custom distribution (file ./tools/dist_quantile.txt) Traceback (most recent call last): File "/home/user01/file/node05/joyecai/code/mlcommons/inference/recommendation/dlrm/pytorch/python/main.py", line 619, in main()
File "/home/user01/file/node05/joyecai/code/mlcommons/inference/recommendation/dlrm/pytorch/python/main.py", line 503, in main
model = backend.load(args.model_path, inputs=args.inputs, outputs=args.outputs)
File "/home/user01/file/node05/joyecai/code/mlcommons/inference/recommendation/dlrm/pytorch/python/backend_pytorch_native.py", line 61, in load
dlrm = dlrm.to(self.device) # .cuda()
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/home/user01/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.91 GiB (GPU 0; 79.18 GiB total capacity; 70.34 GiB already allocated; 8.00 GiB free; 70.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF