xuegao-tzx / Fllama

A flutter binding for llama.cpp, which use platform channel.
https://pub.dev/packages/fcllama
MIT License
9 stars 0 forks source link
ai android dart flutter harmonyos ios llamacpp

FCllama

License: MIT pub package

Flutter binding of llama.cpp , which use platform channel .

llama.cpp: Inference of LLaMA model in pure C/C++

Installation

Flutter

flutter pub add fcllama

iOS

Please run pod install or pod update in your iOS project.

Android

You need install cmake 3.31.0、android sdk 35 and ndk 28.0.12674087. No additional operation required .

OpenHarmonyOS/HarmonyOS

This is the fastest and recommended way to add HLlama to your project.

ohpm install hllama

Or, you can add it to your project manually.

"dependencies": {
  "hllama": "^0.0.2",
}
  ohpm install

How to use

Flutter

  1. Initializing Llama
import 'package:fcllama/fllama.dart';

FCllama.instance()?.initContext("model path",emitLoadProgress: true)
        .then((context) {
  modelContextId = context?["contextId"].toString() ?? "";
  if (modelContextId.isNotEmpty) {
    // you can get modelContextId,if modelContextId > 0 is success.
  }
});
  1. Bench model on device
import 'package:fcllama/fllama.dart';

FCllama.instance()?.bench(double.parse(modelContextId),pp:8,tg:4,pl:2,nr: 1).then((res){
  Get.log("[FCllama] Bench Res $res");
});
  1. Tokenize and Detokenize
import 'package:fcllama/fllama.dart';

FCllama.instance()?.tokenize(double.parse(modelContextId), text: "What can you do?").then((res){
  Get.log("[FCllama] Tokenize Res $res");
  FCllama.instance()?.detokenize(double.parse(modelContextId), tokens: res?['tokens']).then((res){
    Get.log("[FCllama] Detokenize Res $res");
  });
});
  1. Streaming monitoring
import 'package:fcllama/fllama.dart';

FCllama.instance()?.onTokenStream?.listen((data) {
  if(data['function']=="loadProgress"){
    Get.log("[FCllama] loadProgress=${data['result']}");
  }else if(data['function']=="completion"){
    Get.log("[FCllama] completion=${data['result']}");
    final tempRes = data["result"]["token"];
    // tempRes is ans
  }
});
  1. Release this or Stop one
import 'package:fcllama/fllama.dart';

FCllama.instance()?.stopCompletion(contextId: double.parse(modelContextId)); // stop one completion
FCllama.instance()?.releaseContext(double.parse(modelContextId)); // release one
FCllama.instance()?.releaseAllContexts(); // release all

OpenHarmonyOS/HarmonyOS

You can see this file

Support System

System Min SDK Arch Other
Android 23 arm64-v8a、x86_64、armeabi-v7a Supports additional optimizations for certain CPUs
iOS 14 arm64 Support Metal
OpenHarmonyOS/HarmonyOS 12 arm64-v8a、x86_64 No additional optimizations for certain CPUs are supported

Obtain the model

You can search HuggingFace for available models (Keyword: GGUF).

For get a GGUF model or quantize manually, see Prepare and Quantize section in llama.cpp.

NOTE

iOS:

Android:

License

MIT