[BUG]: Tensorflow faster on Windows 11 WSL2 Ubuntu 22.04, i9-13900K

Bug description

Sharing the report for visibility! Great to see this stuff working!

~/mojo/max/examples/performance-showcase$ python3 run.py -m roberta

Doing some one time setup. This takes 5 minutes or so, depending on the model. Get a cup of coffee and we'll see you in a minute!

Done! [100%]

Starting inference throughput comparison

----------------------------------------System Info---------------------------------------- CPU: 13th Gen Intel(R) Core(TM) i9-13900K Arch: X86_64 Clock speed: 2.9952 GHz Cores: 30

Running with TensorFlow .......................................................................................... QPS: 46.26

Running with PyTorch .......................................................................................... QPS: 22.21

Running with MAX Engine Compiling model. Done! .......................................................................................... QPS: 25.92

====== Speedup Summary ======

MAX Engine vs TensorFlow: Oh, darn that's only 0.56x stock performance. MAX Engine vs PyTorch: That's about 1.17x faster.

Hold on a tick... We normally see speedups of roughly 2.50x on TensorFlow and 1.20x on PyTorch for roberta on X86_64. Honestly, we would love to hear from you to learn more about the system you're running on! (https://github.com/modularml/max/issues/new/choose)

Steps to reproduce

Include relevant code snippet or link to code that did not work as expected.
If applicable, add screenshots to help explain the problem.
Include anything else that might help us debug the issue.

System information

- What OS did you do install MAX on ?
Windows 11, WSL2 (running this in WSL2, Ubuntu 22.04)

- Provide version information for MAX by pasting the output of max -v`
max 24.1.0 (c176f84d)
Modular version 24.1.0-c176f84d-release

- Provide version information for Mojo by pasting the output of mojo -v`
mojo 24.1.0 (c176f84d)

- Provide Modular CLI version by pasting the output of `modular -v`
modular 0.5.1 (1b608e3d)

modularml / max