As I'm planning to add Gemma and llama.cpp based LLMs, it is important to form an abstraction above all these 'LLM' providers. With this change, I expect that the users will have to make minimal changes to swap different LLM providers and will maintain the simplicity of the codebase too.
The PR associated with this issue is adding an abstract class LLMProvider with the following definition:
import kotlinx.coroutines.flow.Flow
abstract class LLMProvider {
abstract suspend fun getResponse(prompt: String): Flow<String>
}
The return type of Flow<String is necessary as on-device LLMs (like Gemma or the ones using the llama.cpp framework) will perform inference asynchronously and stream text tokens as a response.
Moreover, this will also reduce the relative 'waiting time' the user experiences before getting the first response. For a synchronous inference, the user has to wait for longer as the response is not displayed until its produced completely by the LLM.
As I'm planning to add Gemma and
llama.cpp
based LLMs, it is important to form an abstraction above all these 'LLM' providers. With this change, I expect that the users will have to make minimal changes to swap different LLM providers and will maintain the simplicity of the codebase too.The PR associated with this issue is adding an abstract class
LLMProvider
with the following definition:The return type of
Flow<String
is necessary as on-device LLMs (like Gemma or the ones using thellama.cpp
framework) will perform inference asynchronously and stream text tokens as a response.Moreover, this will also reduce the relative 'waiting time' the user experiences before getting the first response. For a synchronous inference, the user has to wait for longer as the response is not displayed until its produced completely by the LLM.