TorchChat is slower than gpt-fast

malfet commented 4 months ago

Using `torch==2.4.0.dev20240502` on Apple M2 pro I get following numbers for stories110M + float16 dtype	application	speed (eager)	speed (compile)
gpt-fast	176 tokens/sec	99 tokens/sec
torchchat	76 tokens/sec	33 tokens/sec

Commands to reproduce:

% python3 -mpip install --pre torch==2.4.0.dev20240502 --index-url https://download.pytorch.org/whl/nightly/cpu
% git clone https://github.com/pytorch-labs/gpt-fast -b malfet/set-prec-to-float16
% cd gpt-fast
 % python3 generate.py --checkpoint_path ~/git/pytorch/torchchat/.model-artifacts/stories110M/stories110M.pt

and for torchchat

% python3 torchchat.py generate stories110M --dtype float16 --device cpu

mikekgfb commented 4 months ago

 python3 torchchat.py generate stories110M --dtype float16

This runs --device fast which translates into MPS. You might specify --device cpu. Also if CPU is faster than MPS, we should drop it from the devices selected for device "fast"

malfet commented 4 months ago

python3 torchchat.py generate stories110M --dtype float16
This runs --device fast which translates into MPS. You might specify --device cpu. Also if CPU is faster than MPS, we should drop it from the devices selected for device "fast"

No, it was not the case until https://github.com/pytorch/torchchat/pull/694 was landed, but let me clarify that

ezyang commented 4 months ago

why is the compile tok/sec lower than the eager tok/sec

pytorch / torchchat

TorchChat is slower than gpt-fast #653