Open taegeonum opened 1 month ago
this is a good question, it might be possible , however phi is a small model so the impact may not be too observable. As of now we didn't yet try to use multi-stream, but updating compiler to enable manual stream specification could be possible.
❓ General Questions
Hello, in phi model, attention and mlp blocks can be executed in parallel because they do not have dependency. In the following code, self.mixer and self.mlp can be executed in parallel.
Questions