pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.13k stars 196 forks source link

TorchChat is slower than gpt-fast #653

Open malfet opened 4 months ago

malfet commented 4 months ago
Using torch==2.4.0.dev20240502 on Apple M2 pro I get following numbers for stories110M + float16 dtype application speed (eager) speed (compile)
gpt-fast 176 tokens/sec 99 tokens/sec
torchchat 76 tokens/sec 33 tokens/sec

Commands to reproduce:

% python3 -mpip install --pre torch==2.4.0.dev20240502 --index-url https://download.pytorch.org/whl/nightly/cpu
% git clone https://github.com/pytorch-labs/gpt-fast -b malfet/set-prec-to-float16
% cd gpt-fast
 % python3 generate.py --checkpoint_path ~/git/pytorch/torchchat/.model-artifacts/stories110M/stories110M.pt 

and for torchchat

% python3 torchchat.py generate stories110M --dtype float16 --device cpu
mikekgfb commented 4 months ago
 python3 torchchat.py generate stories110M --dtype float16

This runs --device fast which translates into MPS. You might specify --device cpu. Also if CPU is faster than MPS, we should drop it from the devices selected for device "fast"

malfet commented 4 months ago
python3 torchchat.py generate stories110M --dtype float16

This runs --device fast which translates into MPS. You might specify --device cpu. Also if CPU is faster than MPS, we should drop it from the devices selected for device "fast"

No, it was not the case until https://github.com/pytorch/torchchat/pull/694 was landed, but let me clarify that

ezyang commented 4 months ago

why is the compile tok/sec lower than the eager tok/sec