modularml / max

A collection of sample programs, notebooks, and tools which highlight the power of the MAX Platform
https://www.modular.com
Other
201 stars 31 forks source link

[BUG]: #138

Closed maresk closed 2 months ago

maresk commented 2 months ago

Bug description

Observing only marginally better speeds on smaller Linux machines with Ubuntu Jammy on multiple trials. Is there a recommended minimum for the number of cores to observe significant speedups ?

---------------------------------------System Info---------------------------------------- CPU: AMD EPYC 7B12 Arch: X86_64 Clock speed: 2.2500 GHz Cores: 2

Running with TensorFlow .......................................................................................... QPS: 3.49

Running with PyTorch .......................................................................................... QPS: 3.45

Running with MAX Engine Compiling model.
Done! .......................................................................................... QPS: 3.89

====== Speedup Summary ======

MAX Engine vs TensorFlow: That's about 1.12x faster. MAX Engine vs PyTorch: That's about 1.13x faster.

----------------------------------------System Info---------------------------------------- CPU: AMD EPYC 7B12 Arch: X86_64 Clock speed: 2.2500 GHz Cores: 2

Running with TensorFlow .......................................................................................... QPS: 3.41

Running with PyTorch 0it [00:00, ?it/s] .......................................................................................... QPS: 3.46

Running with MAX Engine Compiling model..
Done! .......................................................................................... QPS: 3.80

====== Speedup Summary ======

MAX Engine vs TensorFlow: That's about 1.11x faster. MAX Engine vs PyTorch: That's about 1.10x faster.

~/Sandbox/max/examples/performance-showcase$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: AuthenticAMD Model name: AMD EPYC 7B12 CPU family: 23 Model: 49 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 Stepping: 0 BogoMIPS: 4499.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht sys call nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_k nown_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyper visor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stib p vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save umip rdpid Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all):
L1d: 32 KiB (1 instance) L1i: 32 KiB (1 instance) L2: 512 KiB (1 instance) L3: 16 MiB (1 instance) NUMA:
NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerabilities:
Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Not affected

Steps to reproduce

System information

- What OS did you do install MAX on ? 
- Ubuntu Jammy
- 
- Provide version information for MAX by pasting the output of max -v`
- max 24.2.0 (c2427bc5)
- Modular version 24.2.0-c2427bc5-release
-
- Provide version information for Mojo by pasting the output of mojo -v`
- mojo 24.2.0 (c2427bc5)
- 
- Provide Modular CLI version by pasting the output of `modular -v`
- modular 0.6.0 (04c05243)
ehsanmok commented 2 months ago

Thanks for reporting! Please see this FAQ.