weird observation - Githubissues

amazing, im running vicuna 7b in the browser and getting pretty decent performance. so, for comparison, i decided to spin up a p2 instance and see how a k80 runs vicuna 7b. .... its slower. what? some expensive tesla is slower than my amd radeon? yep, inference for vicuna 7b is about 3x faster on my laptop running webgpu. double-checked im running the same quantized model. double-checked that pytorch is really using the gpu (nvidia-smi utilization at 99%)

ok, lets compare to the geforce rtx accelerator on my gaming laptop. convinced chrome to see it instead of my amd. ... also slower inference.

why is my built-in cheapo amd radeon doing inference so much faster via webgpu? should i run out and buy a beefier radeon?

mlc-ai / web-llm

weird observation #177