Open lgyStoic opened 2 years ago
In my mac m1, using CPU(arm) diffmpm can running in 14FPS, but using GPU(metal) running much slower, only less then 2 FPS, also in 3080(cuda) ,is there any problem this compiler do optimization in IR level?
Also reproduced on my Intel + nvidia GPU workstation.
CPU: i9-11900k GPU: RTX3080
with ti.cpu: 13 FPS with ti.cuda: 10 FPS
Script: examples/diffmpm.py
In my mac m1, using CPU(arm) diffmpm can running in 14FPS, but using GPU(metal) running much slower, only less then 2 FPS, also in 3080(cuda) ,is there any problem this compiler do optimization in IR level?