sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
547 stars 84 forks source link

Explanation of FluentAPI settings #75

Open drajvver opened 1 year ago

drajvver commented 1 year ago

Hello!

Is there any information which "With~" in the fluent api corresponds to which settings/flags in whisper.cpp? I'm mostly interested in -ml flag, which allows for limiting output length per line.

Looks like the WithMaxSegmentLength() should work the same way as -ml but I think it does not

Thanks!

sandrohanea commented 1 year ago

Hello @drajvver, Not all the flags in the main example of whisper.cpp have a correlated With~ fluent API, but all whisper.cpp whisper_full_params https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h#L332 have a correlated FluentAPI in whisper.net.

Some of the arguments are just implemented on the client (e.g. diarization): but I added example of this as well: https://github.com/sandrohanea/whisper.net/tree/main/examples/Diarization

For the -ml (--max-len), there are multiple whisper_full_params changes: https://github.com/ggerganov/whisper.cpp/blob/master/examples/main/main.cpp#LL776C1-L779C1

The Whisper.net equivalent of that would be:

        .WithTokenTimestamps()
        .WithMaxSegmentLength(15)
drajvver commented 1 year ago

So I think that it does not work as it should or I'm making some sort of silly mistake. For this: https://www.youtube.com/shorts/g9IYllmOtUc

And settings:

await using var processor = whisperFactory.CreateBuilder()
.WithLanguage("en")
.WithTemperature(0.2f)
.WithTokenTimestamps()
.WithMaxSegmentLength(4)
.WithPrintProgress()
.WithPrintResults()
.WithPrintTimestamps()
.Build();

I get output like this:

[00:00:00.000 --> 00:00:06.140] My friend Julius just moved into his new home and needed to go grab some tools, so he asked me to watch his place. [00:00:06.140 --> 00:00:13.060] I watched his kitchen and found what I thought was his only ramen stash, but I looked to the left and saw another bag of ramen packages. [00:00:13.060 --> 00:00:19.700] I started digging through it to see if any of them sounded good. Then I looked to the right and found even more instant ramen in a box. [00:00:19.700 --> 00:00:28.060] Then I felt the urge to turn around and boom, there's another bag of noodles. I grabbed the super spicy ones and started to quickly make them before Julius got back. [00:00:28.060 --> 00:00:35.900] I felt like I had spent too much time perusing his ramen stash, so I didn't add much to this. Now was it super spicy as advertised? Eh. [00:00:35.900 --> 00:00:40.980] It definitely had a pleasant kick and the noodles were nice and chewy, but spice was probably a 3 out of 10.

It's completely possible that I'm doing something very wrong but I can't see what would that be

sandrohanea commented 1 year ago

It sounds indeed like a bug, but didn't have time to check it yet :(

sandrohanea commented 1 year ago

Hello again @drajvver , I tried to reproduce the bug but couldn't (whisper.net was returning the same as whisper.cpp) for tiny model image

Can you please try to create some repro zip (including the model), using Whisper.net 1.4.4?