Local x86 benchmarking though OnnxRuntime fails for half of the models

danielholanda commented 1 year ago

Issue:

OnnxRuntime seems to be unable to execute half of our benchmarks. Therefore, it may not be appropriate as a baseline.

For example, simply running benchit models/torch_hub/alexnet.py --device x86 results in the following error:

Models discovered during profiling:

alexnet.py:
        model (executed 1x)
                Model Type:     Pytorch (torch.nn.Module)
                Class:          AlexNet (<class 'torchvision.models.alexnet.AlexNet'>)
                Location:       /home/dhnoronha/.cache/torch/hub/pytorch_vision_v0.13.1/torchvision/models/alexnet.py, line 116
                Parameters:     61,100,840 (116.5 MB)
                Hash:           2891f54c
                Status:         Unknown benchit error: Error: Failure to run model using onnxruntime - 
                Traceback (most recent call last):
                  File "/net/home/dhnoronha/mlagility/src/mlagility/analysis/analysis.py", line 123, in call_benchit
                    perf = benchmark_model(
                  File "/net/home/dhnoronha/mlagility/src/mlagility/api/model_api.py", line 178, in benchmark_model
                    perf = cpu_model.benchmark(backend=backend)
                  File "/net/home/dhnoronha/mlagility/src/mlagility/api/ortmodel.py", line 20, in benchmark
                    benchmark_results = self._execute(repetitions=repetitions, backend=backend)
                  File "/net/home/dhnoronha/mlagility/src/mlagility/api/ortmodel.py", line 71, in _execute
                    devices.execute_cpu_locally(self.state, self.device, repetitions)
                  File "/net/home/dhnoronha/mlagility/src/mlagility/api/devices.py", line 624, in execute_cpu_locally
                    raise BenchmarkException(
                mlagility.api.devices.BenchmarkException: Error: Failure to run model using onnxruntime -

Task:

Understand whether or not that behavior is expected from onnxruntime
If the behavior is expected, we might need to benchmark models before converting them to onnx as a benchmark
If the bevavior is not expected, we need to fix our local x86 benchmarking

jeremyfowers commented 1 year ago

It's a segfault. The easiest way to repro is to go into devices.py https://github.com/groq/mlagility/blob/54a774cbcbded984c213798acdd2aea43f05b7ee/src/mlagility/api/devices.py#L609

and print out the command that is being used in the subprocess. From there you can run the command yourself to see the segfault happen, or open it up in a python debugger.

ramkrishna2910 commented 1 year ago

This is due to the FP16 converter issue. I have tested a bunch of these cases and the base model passes but the FP16 version that we use for benchmarking fails to run.

jeremyfowers commented 1 year ago

I would love to add a FP32 version of the benchmark and have that as a toggle on the dashboard.

@ramkrishna2910 I guess the failures are not user error then? In the sense that ORT does not support a variety of valid fp16 ONNX files?

ramkrishna2910 commented 1 year ago

Yes, these are not user errors. This is due to some issues in the FP16 converter. I have created an issue upstream as well on this. @danielholanda and I spoke about adding the FP32 numbers, but we were worried that it will make the dashboard too chaotic and comparing FP32 CPU with FP16 GPU may not be right.

groq / mlagility

Local x86 benchmarking though OnnxRuntime fails for half of the models #131