Open andresovela opened 1 month ago
No worries, more information helps us improve the tools. Yes, that is feasible. We have been primarily focused on flash-based workflows, but there are various options to achieve what you are looking for. What is the size of the model? Would it fit within two tiles? We could communicate directly via emails or so, as that might be easier to share more info regarding the models. My email is deepakpanickal@xmos.com .
Nice, I'll contact you via email then :)
This can be done by splitting the model yourself and compiling them separately with xmos-ai-tools. You would have to wire them up in the application source code. It's not recommended though, as you would lose quite a bit of space for code and tensor arena on both tiles. When splitting to put the model on multiple tiles, better to keep the code/tensor arena on one tile, and weights on one or more tiles.
I agree it's a waste of resources, but it may allow a model to run faster than if you had to read the weights from flash. Realistically, I think we won't use this option, but it'd be good to have an example I guess :)
First of all, sorry for spamming you with issues :sweat_smile: I'm just trying to optimize inference for performance on a model larger than 512 kB and I'm exploring all possible options at the moment.
I think one alternative would be to split the large model into two smaller models, run each one on separate tiles and have the output of the model on tile[0] be the input to the model on tile[1]. Is that technically feasible using channels or is that a no-go?