pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.57k stars 507 forks source link

fixing over padding and GPTQ padding bug #101

Closed jerryzh168 closed 7 months ago

jerryzh168 commented 7 months ago

Summary: don't always need to pad to 1024, only that groupsize, inner_k_tiles*16 can divide into the inner_dim

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]