Closed zitgit closed 1 week ago
@mgoin Hi! Have u tried e5m2 quant or mix format quant such as e4m3 for weight and e5m2 for activation?
I changed the quantize method to torch_e5m2 for testing accuracy. The outputs were totally wrong.
e5m2 loses too much precision generally. This is why e4m3 with a well-tuned scale is better in most cases
@mgoin Hi! Have u tried e5m2 quant or mix format quant such as e4m3 for weight and e5m2 for activation?