The qwen2 model is the sota of the hf leaderboard. And compared with the llama model, there is only one more bias in the qkv dense of mha. Therefore, only a few modifications are required to achieve compatibility with this high-quality model.
Similarly, the Yi model is also a powerful Chinese LLM. Its performance is comparable to that of qwen2, and it fully adopts the llama architecture.
Therefore, in theory keras_nlp compatible with these two models does not take a lot of time. Hope to achieve compatibility with them in the future
https://huggingface.co/Qwenhttps://huggingface.co/01-ai
The qwen2 model is the sota of the hf leaderboard. And compared with the llama model, there is only one more bias in the qkv dense of mha. Therefore, only a few modifications are required to achieve compatibility with this high-quality model. Similarly, the Yi model is also a powerful Chinese LLM. Its performance is comparable to that of qwen2, and it fully adopts the llama architecture. Therefore, in theory keras_nlp compatible with these two models does not take a lot of time. Hope to achieve compatibility with them in the future https://huggingface.co/Qwen https://huggingface.co/01-ai