issues
search
pytorch-labs
/
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.34k
stars
484
forks
source link
[WIP] Use DTensor-based tensor parallel
#180
Open
kwen2501
opened
2 weeks ago
kwen2501
commented
2 weeks ago
Stack from
ghstack
(oldest at bottom):
->
#180
Status:
Switched to DTensor based TP in regular tensor path
Result is correct, but there is a perf gap (seems to perform extra colls in the beginning, investigating)
TODO: switch to DTensor for quantized path too
Stack from ghstack (oldest at bottom):
Status: