nod-ai / sharktank

SHARK Inference Modeling and Serving
Apache License 2.0
7 stars 9 forks source link

[quant] When broadcasting the weight of a bmm, broadcast then ext. #88

Closed stellaraccident closed 2 weeks ago

stellaraccident commented 2 weeks ago

Was reversed. This works better with the bmm mixed precision op which conceptually does an internal ext.