Flutter binding of llama.cpp , which use platform channel .
llama.cpp: Inference of LLaMA model in pure C/C++
flutter pub add fcllama
Please run pod install
or pod update
in your iOS project.
You need install cmake 3.31.0、android sdk 35 and ndk 28.0.12674087. No additional operation required .
This is the fastest and recommended way to add HLlama to your project.
ohpm install hllama
Or, you can add it to your project manually.
oh-package.json5
on your app module."dependencies": {
"hllama": "^0.0.2",
}
ohpm install
import 'package:fcllama/fllama.dart';
FCllama.instance()?.initContext("model path",emitLoadProgress: true)
.then((context) {
modelContextId = context?["contextId"].toString() ?? "";
if (modelContextId.isNotEmpty) {
// you can get modelContextId,if modelContextId > 0 is success.
}
});
import 'package:fcllama/fllama.dart';
FCllama.instance()?.bench(double.parse(modelContextId),pp:8,tg:4,pl:2,nr: 1).then((res){
Get.log("[FCllama] Bench Res $res");
});
import 'package:fcllama/fllama.dart';
FCllama.instance()?.tokenize(double.parse(modelContextId), text: "What can you do?").then((res){
Get.log("[FCllama] Tokenize Res $res");
FCllama.instance()?.detokenize(double.parse(modelContextId), tokens: res?['tokens']).then((res){
Get.log("[FCllama] Detokenize Res $res");
});
});
import 'package:fcllama/fllama.dart';
FCllama.instance()?.onTokenStream?.listen((data) {
if(data['function']=="loadProgress"){
Get.log("[FCllama] loadProgress=${data['result']}");
}else if(data['function']=="completion"){
Get.log("[FCllama] completion=${data['result']}");
final tempRes = data["result"]["token"];
// tempRes is ans
}
});
import 'package:fcllama/fllama.dart';
FCllama.instance()?.stopCompletion(contextId: double.parse(modelContextId)); // stop one completion
FCllama.instance()?.releaseContext(double.parse(modelContextId)); // release one
FCllama.instance()?.releaseAllContexts(); // release all
You can see this file
System | Min SDK | Arch | Other |
---|---|---|---|
Android | 23 | arm64-v8a、x86_64、armeabi-v7a | Supports additional optimizations for certain CPUs |
iOS | 14 | arm64 | Support Metal |
OpenHarmonyOS/HarmonyOS | 12 | arm64-v8a、x86_64 | No additional optimizations for certain CPUs are supported |
You can search HuggingFace for available models (Keyword: GGUF
).
For get a GGUF model or quantize manually, see Prepare and Quantize
section in llama.cpp.
iOS:
Android:
MIT