Open moonbox3 opened 3 weeks ago
During SK integration tests, we perform 5 Ollama model pulls:
ollama pull ${{ vars.OLLAMA_CHAT_MODEL_ID }} ollama pull ${{ vars.OLLAMA_CHAT_MODEL_ID_IMAGE }} ollama pull ${{ vars.OLLAMA_CHAT_MODEL_ID_TOOL_CALL }} ollama pull ${{ vars.OLLAMA_TEXT_MODEL_ID }} ollama pull ${{ vars.OLLAMA_EMBEDDING_MODEL_ID }}
This puts a lot of stress on the network, is prone to failure, and adds latency. For example, here's a failure: https://github.com/microsoft/semantic-kernel/actions/runs/11685739831/job/32539906324#step:10:966.
We do need coverage for AI connectors; however, it may make more sense to deploy one Azure resource that Ollama can use to handle all the chat completion related operations, and one embedding model for those tests.
This model may work for the tool call scenario: https://ollama.com/library/smollm2:135m or this small one: https://ollama.com/library/moondream
During SK integration tests, we perform 5 Ollama model pulls:
This puts a lot of stress on the network, is prone to failure, and adds latency. For example, here's a failure: https://github.com/microsoft/semantic-kernel/actions/runs/11685739831/job/32539906324#step:10:966.
We do need coverage for AI connectors; however, it may make more sense to deploy one Azure resource that Ollama can use to handle all the chat completion related operations, and one embedding model for those tests.