Closed stellanhaglund closed 1 year ago
Maybe an option could be to replace the parts in the model that makes it use this operation. do you have any idea which torch operation it is?
LirMatMulUnary is tract matrix multiplier. All affine operations (convolutions, matmul, ... ) are ultimately lowered to this specific operator. It is expected that a neural network spend most of its time doing matrix products. I would actually say the proportion feels low, it is often in the ~90% neighborhood. Looking for a replacement is not the right path.
Can you share a bit more ? Are there specific instances of MatMuls that are under-performing ? --cost --profile gives you a cost and velocity indication (in GFlops/s).
tract is relatively efficient on M1 and M2. But tract uses a single CPU core to run the inference, while pytorch and other frameworks may make use of multiple cores (plus a gpu sometimes).
You could also try the main branch. A lot of effort has been put recently on making the optimizer more robust across network architectures. It's not fully baked yet, but it may be interesting.
Okay I see! I will try that out, I guess I have to go with a smaller model for this to be feasible. Will try this one! https://github.com/chinhsuanwu/mobilevit-pytorch
Closing here, reopen or create a new issue if needed.
Hi I'm trying to run a custom pytorch vision transformer model, or an audio spectrogram transformer more specifically and I was able to get everything working but the problem is that the inference when I ran it on cpu in pytorch was about 100ms but in tract i get about 450ms.
I ran it through the tract cli profile and got this
From that it looks like it could be
LirMatMulUnary
that's slowing it down. Is there anything I can do about this?Right now I'm running it on a M2 but I'm hoping to be able to run it on mobile devices as well.