.NET wrapper of HuggingFace Tokenizers library
Package | main | Description |
---|---|---|
Tokenizers.DotNet | Core library | |
Tokenizers.DotNet.runtime.win | Native bindings for windows x64 |
.json
) from localTokenizers.DotNet
packageTokenizers.DotNet.runtime.win
package tooCheck following example code:
using Tokenizers.DotNet;
// Download skt/kogpt2-base-v2/tokenizer.json from the hub
var hubName = "skt/kogpt2-base-v2";
var filePath = "tokenizer.json";
var fileFullPath = await HuggingFace.GetFileFromHub(hubName, filePath, "deps");
Console.WriteLine($"Downloaded {fileFullPath}");
// Create a tokenizer instance
var tokenizer = new Tokenizer(vocabPath: fileFullPath);
var text = "음, 이제 식사도 해볼까요";
Console.WriteLine($"Input text: {text}");
var tokens = tokenizer.Encode(text);
Console.WriteLine($"Encoded: {string.Join(", ", tokens)}");
var decoded = tokenizer.Decode(tokens);
Console.WriteLine($"Decoded: {decoded}");
Console.WriteLine($"Version of Tokenizers.DotNet.runtime.win: {tokenizer.GetVersion()}");
Console.WriteLine("--------------------------------------------------");
// Use another tokenizer
//// Download openai-community/gpt2 from the hub
hubName = "openai-community/gpt2";
filePath = "tokenizer.json";
fileFullPath = await HuggingFace.GetFileFromHub(hubName, filePath, "deps");
// Create a tokenizer instance
var tokenizer2 = new Tokenizer(vocabPath: fileFullPath);
var text2 = "i was nervous before the exam, and i had a fever.";
Console.WriteLine($"Input text: {text2}");
var tokens2 = tokenizer2.Encode(text2);
Console.WriteLine($"Encoded: {string.Join(", ", tokens2)}");
var decoded2 = tokenizer2.Decode(tokens2);
Console.WriteLine($"Decoded: {decoded2}");
Console.WriteLine($"Version of Tokenizers.DotNet.runtime.win: {tokenizer2.GetVersion()}");
Console.ReadKey();
cargo
)dotnet 6.0, 7.0, 8.0
)7.4.2
or above)build_all_clean.ps1
Tokenizers.DotNet.runtime.win
only, run build_rust.ps1
Tokenizers.DotNet
only, run build_dotnet.ps1
Each build artifacts will be in nuget
directory.