Checking the performance against the original tiktoken (written in Rust) and against the Java port, the Dart regex fragment parser is insanely slow (roughly 30x slower).
Even simple regex optimizations don't help since Dart doesn't seem to support possessives (and is most likely slow because of backtracking).
Checking the performance against the original tiktoken (written in Rust) and against the Java port, the Dart regex fragment parser is insanely slow (roughly 30x slower).
Even simple regex optimizations don't help since Dart doesn't seem to support possessives (and is most likely slow because of backtracking).
A dedicated cl100k parser could help - as done in https://github.com/knuddelsgmbh/jtokkit/pull/77.