[BUG]: Max Performance Showcase Comparison Results

igoforth commented 3 months ago

Bug description

Performance isn't as expected.

➜  performance-showcase git:(main) ✗ modular host-info
  Host Information
  ================

  Target Triple: x86_64-unknown-linux
  CPU: tigerlake
  CPU Features: adx, aes, avx, avx2, avx512bitalg, avx512bw, avx512cd, avx512dq, avx512f, avx512ifma, avx512vbmi, avx512vbmi2, avx512vl, avx512vnni, avx512vp2intersect, avx512vpopcntdq, bmi, bmi2, clflushopt, clwb, cmov, crc32, cx16, cx8, evex512, f16c, fma, fsgsbase, fxsr, gfni, invpcid, kl, lzcnt, mmx, movbe, movdir64b, movdiri, pclmul, pku, popcnt, prfchw, rdpid, rdrnd, rdseed, sahf, sgx, sha, shstk, sse, sse2, sse3, sse4.1, sse4.2, ssse3, vaes, vpclmulqdq, widekl, x87, xsave, xsavec, xsaveopt, xsaves

➜  performance-showcase git:(main) ✗ python3 run.py -m roberta
Doing some one time setup. This takes 5 minutes or so, depending on the model.
Get a cup of coffee and we'll see you in a minute!

Done! [100%]

Starting inference throughput comparison

----------------------------------------System Info----------------------------------------
CPU: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
Arch: X86_64
Clock speed: 2.5000 GHz
Cores: 4

Running with TensorFlow
.......................................................................................... QPS: 9.31

Running with PyTorch
.......................................................................................... QPS: 6.32

Running with MAX Engine
Compiling model..
Done!
.......................................................................................... QPS: 7.68

====== Speedup Summary ======

MAX Engine vs TensorFlow: Oh, darn that's only 0.82x stock performance.
MAX Engine vs PyTorch: Oh, darn that's only 0.82x stock performance.

Hold on a tick... We normally see speedups of roughly 2.50x on TensorFlow for roberta on X86_64. Honestly, we would love to hear from you to learn more about the system you're running on! (https://github.com/modularml/max/issues/new/choose)

Steps to reproduce

Include relevant code snippet or link to code that did not work as expected.
If applicable, add screenshots to help explain the problem.
Include anything else that might help us debug the issue. Ran in Windows 10 Hyper-V VM My VM has auto-scaling memory allocation, so at low load it will report low max ram, and increase available as load increases

Ubuntu clang version 19.0.0 (++20240318042139+208a9850e6a4-1~exp1~20240318042301.1564) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/lib/llvm-19/bin

cc (Debian 13.2.0-13) 13.2.0 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Python 3.11.8 I use pdm and pyenv to manage my python environment PYTHONPATH=/home/user/.local/share/pdm/venv/lib/python3.11/site-packages/pdm/pep582 PWD=/home/user/.local/src/max/examples/performance-showcase

System information

- What OS did you do install MAX on ?
Linux kali 6.6.9-amd64 #1 SMP PREEMPT_DYNAMIC Kali 6.6.9-1kali1 (2024-01-08) x86_64 GNU/Linux
- Provide version information for MAX by pasting the output of max -v`
max 24.1.1 (0ab415f7)
Modular version 24.1.1-0ab415f7-release
- Provide version information for Mojo by pasting the output of mojo -v`
mojo 24.1.1 (0ab415f7)
- Provide Modular CLI version by pasting the output of `modular -v`
modular 0.5.2 (6b3a04fd)

ehsanmok commented 3 months ago

Thanks for reporting this issue! We're working on it. Please also take a look at this explainer.

ehsanmok commented 3 months ago

Also, please see our FAQ for this.

modularml / max