nod-ai / sharktank

SHARK Inference Modeling and Serving
Apache License 2.0
7 stars 9 forks source link

[punet] Add direct to linalg integer kernels for mmt, conv, pooling sum. #68

Closed stellaraccident closed 3 weeks ago

stellaraccident commented 3 weeks ago

Our quantization scheme relies on having integer kernels for a few key ops. While torch mm/conv ops are technically defined for integer operands, in practice, the type matrix is very sparsely implemented across backends. As such, we just define IREE specific ops for these and use them.

Unlike mm/conv, torch does not have a good pooling sum operator, instead using avg_pool with a denominator override (only defined for FP). Therefore, we provide that too.