usefulsensors / useful-transformers

Efficient Inference of Transformer models
GNU General Public License v3.0
341 stars 30 forks source link

Providing help and FLOSS stack #12

Open phhusson opened 9 months ago

phhusson commented 9 months ago

Hello,

Your project looks cool, as I was rather sad seeing that Rockchip's NN framework failed to load any useful model.

I've done some reverse engineering ( + reading the datasheet) of RK3588's NPU ( https://github.com/phhusson/rknpu-reverse-engineering/), and I think that maybe I can help.

Reading your TODO, you're using RKNN exclusively to do matrix (not higher order tensors?) multiplications, is that intended? (NPU can do RELu, max/min/average pooling, convolutions)

I see you're waiting for rockchip for int4 matmul, hoping there is no hardware bug preventing it, I should be able to provide one if that's the most useful thing you need?

Either way, seeing your usage I'll try to write a FLOSS reimplementation of rockchip's matmul, to get rid of that proprietary blob.

keveman commented 9 months ago

@phhusson This library is indeed using the proprietary binary blob to perform the matrix multiplications. It is unfortunate that Rockchip is keeping the NPU fully closed. For the transformer models, 8-bit and/or 4-bit matrix multiplication are really all we need. Currently, only FP16 matrix multiplies are being used, but I didn't see that much performance improvement for the tiny.en model when using int8. Reverse engineering just the matrix multiplies would be quite useful for the community in general.