Closed mahidhar96 closed 9 months ago
Hi! Does this work for smaller batch sizes? It could be some problem with trying to allocate too much memory.
Yes it works for batch_size < 10000. I'm clearing CUDA memory before running the benchmark, but in general if it is a memory issue we get a different error something similar to this
RuntimeError: CUDA out of memory. Tried to allocate 8.20 GiB (GPU 0; 14.56 GiB total capacity; 8.43 GiB already allocated; 5.30 GiB free; 8.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
-e
but the above error is different from this.
Hi, are there any updates on this issue? To reclarify this is the error that I'm getting
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I would suggest to stick with smaller batch sizes. My hunch is that something is breaking because you are allocating too much memory, but the only way to know what is really going on is if you enable debugging, as the error suggested.
I keep getting the following error when I try to benchmark the Hummingbird for TorchScript(GPU) for larger batch sizes(batch_size> 10000).
Here's my code: https://github.com/asu-cactus/netsdb/blob/41-decisiontree-gpu/model-inference/decisionTree/experiments/test_model.py (line 260)
This is my model: https://github.com/asu-cactus/netsdb/blob/41-decisiontree-gpu/model-inference/decisionTree/experiments/models/higgs_xgboost_500_8.pkl
Here's my debug output: https://github.com/asu-cactus/netsdb/blob/41-decisiontree-gpu/model-inference/decisionTree/experiments/gpu_results/higgs_500_8.txt
Is there a way to solve this? Are there any changes I need to make for larger batch sizes? This problem is only pertinent to batch_sizes>10000 for Higgs.