Attention layer for transformer architectures?

Hi, I was wondering if any suggestions could be given with regards to implementing transformer architectures. As far as I can tell, the wasi_nn proposal seems to have originally been just intended for neural networks with a particular structure, while now slightly different but new things like unets and etc have become widespread in the image classification/segmentation field and transformers have become much more favored with the text processing end. I am currently working on running a llama transformer model through wasmedge and it unfortunately doesn't seem to be quite as simple as just loading the .bin model file in unless I have made an unrecognized but egregious error someplace else. My thinking is that I may have to somehow split out the rnn stacks and run the attention layer separately for each then chain it in, but I would like to avoid this if possible as I have had no rust programming experience prior to this project and don't really know anything about frameworks for doing so (the model file itself is also several GB with dozens of layers so it would potentially be quite difficult for me to deal with). If it isn't too much trouble, any advice on this would be greatly appreciated.

second-state / WasmEdge-WASINN-examples

Attention layer for transformer architectures? #19