[Performance] Binary operators using SSE on AVX systems

Describe the issue

Hi!

I've been building ORT using the command and noticed binary operators like Add are being executed by the Eigen library, I did some debugging and noticed Eigen is using the SSE version of the add intrinsic to execute the operator, I'm running on a system that supports AVX512 so I'd expect AVX512 intrinsics being used.

Is this the expected behavior? This happens in both Windows 11 and Ubuntu 24.04.1, also tested on AVX2 only systems and SSE is still used.

To reproduce

Build with ./build.sh --config Debug --build_shared_lib --parallel, run the perf test with the model mobilenetv3 and args -m times -r 10 -I.

This is using a FP32 model, but my guess is this happens with any datatype as long as the Eigen add is used and might happen with other binary ops as well.

Urgency

Not urgent, but it's performance we are giving away for free

Platform

Windows

OS Version

Windows 11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

62f99d8a8d4470520f9204608af47f9162c909e8

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

https://github.com/onnx/models/blob/main/Computer_Vision/mobilenetv3_rw_Opset17_timm/mobilenetv3_rw_Opset17.onnx

Is this a quantized model?

Unknown

microsoft / onnxruntime