tazz4843 / whisper-rs

Rust bindings to https://github.com/ggerganov/whisper.cpp
The Unlicense
695 stars 110 forks source link

add full_get_token_bytes #171

Open sribich opened 3 months ago

sribich commented 3 months ago

full_get_segment_bytes exists, but not full_get_token_bytes.

For non English languages, whisper can split tokens on non-valid UTF8 boundaries making finer grained parsing impossible with the current text methods.