fixing over padding and GPTQ padding bug

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD 3-Clause "New" or "Revised" License

5.57k stars 507 forks source link

Closed jerryzh168 closed 7 months ago

jerryzh168 commented 7 months ago

Summary: don't always need to pad to 1024, only that groupsize, inner_k_tiles*16 can divide into the inner_dim

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]