microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.34k stars 278 forks source link

Unable to build GPU #634

Closed mahidhar96 closed 9 months ago

mahidhar96 commented 2 years ago

I keep getting the following error when I try to benchmark the Hummingbird for TorchScript(GPU) for larger batch sizes(batch_size> 10000).

DATASET: higgs
MODEL: randomforest
FRAMEWORK: HummingbirdTorchScriptGPU
Query Size: 100000
Batch Size: 100000
Trees 500
Depth 8
Time Taken to load higgs as a dataframe is: 5096.105337142944
Time Taken to load sklearn model: 187.28971481323242
Traceback (most recent call last):
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py", line 371, in <module>
    test(args, features, label, sklearnmodel, config, time_consume)
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py", line 109, in test
    test_postprocess(*test_gpu(*argv))
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py", line 269, in test_gpu
    results = run_inference(FRAMEWORK, features, input_size, args.query_size, predict, time_consume)
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/model_helper.py", line 113, in run_inference
    output = predict(query_data)
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py", line 267, in predict
    return model.predict(batch)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/containers/sklearn/pytorch_containers.py", line 291, in predict
    return self._run(f_wrapped, *inputs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/containers/_sklearn_api_containers.py", line 67, in _run
    return function(*inputs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/containers/sklearn/pytorch_containers.py", line 289, in <lambda>
    f_wrapped = lambda x: _torchscript_wrapper(device, f, x, extra_config=self._extra_config)  # noqa: E731
  File "/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/containers/sklearn/pytorch_containers.py", line 252, in _torchscript_wrapper
    return function(*inputs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/containers/sklearn/pytorch_containers.py", line 196, in _predict
    return self.model.forward(*inputs)[0].cpu().numpy().ravel()
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_tree_implementations.py(394): forward
/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py(1118): _slow_forward
/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py(1130): _call_impl
/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/_executor.py(113): forward
/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py(1118): _slow_forward
/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py(1130): _call_impl
/home/ubuntu/.local/lib/python3.9/site-packages/torch/jit/_trace.py(967): trace_module
/home/ubuntu/.local/lib/python3.9/site-packages/torch/jit/_trace.py(750): trace
/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/_topology.py(105): _jit_trace
/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/_topology.py(373): convert
/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/convert.py(111): _convert_sklearn
/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/convert.py(405): _convert_common
/home/ubuntu/.local/lib/python3.9/site-packages/hummingbird/ml/convert.py(444): convert
/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py(264): test_gpu
/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py(109): test
/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py(371): <module>
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Here's my code: https://github.com/asu-cactus/netsdb/blob/41-decisiontree-gpu/model-inference/decisionTree/experiments/test_model.py (line 260)

This is my model: https://github.com/asu-cactus/netsdb/blob/41-decisiontree-gpu/model-inference/decisionTree/experiments/models/higgs_xgboost_500_8.pkl

Here's my debug output: https://github.com/asu-cactus/netsdb/blob/41-decisiontree-gpu/model-inference/decisionTree/experiments/gpu_results/higgs_500_8.txt

Is there a way to solve this? Are there any changes I need to make for larger batch sizes? This problem is only pertinent to batch_sizes>10000 for Higgs.

interesaaat commented 2 years ago

Hi! Does this work for smaller batch sizes? It could be some problem with trying to allocate too much memory.

mahidhar96 commented 2 years ago

Yes it works for batch_size < 10000. I'm clearing CUDA memory before running the benchmark, but in general if it is a memory issue we get a different error something similar to this

RuntimeError: CUDA out of memory. Tried to allocate 8.20 GiB (GPU 0; 14.56 GiB total capacity; 8.43 GiB already allocated; 5.30 GiB free; 8.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
-e

but the above error is different from this.

mahidhar96 commented 2 years ago

Hi, are there any updates on this issue? To reclarify this is the error that I'm getting

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
interesaaat commented 2 years ago

I would suggest to stick with smaller batch sizes. My hunch is that something is breaking because you are allocating too much memory, but the only way to know what is really going on is if you enable debugging, as the error suggested.