neuralmagic / AutoFP8

Apache License 2.0
87 stars 12 forks source link

E5M2 or mix format trial #26

Closed zitgit closed 1 week ago

zitgit commented 1 week ago

@mgoin Hi! Have u tried e5m2 quant or mix format quant such as e4m3 for weight and e5m2 for activation?

zitgit commented 1 week ago

image I changed the quantize method to torch_e5m2 for testing accuracy. The outputs were totally wrong.

mgoin commented 1 week ago

e5m2 loses too much precision generally. This is why e4m3 with a well-tuned scale is better in most cases