sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
547 stars 84 forks source link

Traditional Chinese to Simplified Chinese #121

Closed 777sfdf closed 9 months ago

777sfdf commented 11 months ago

Testing on the Windows platform that the text (subtitles) transcribed using Chinese audio are in Traditional Chinese. If you want to output Simplified Chinese, how should you solve it? If you have the time to reply and answer my questions, I would greatly appreciate it. Thanks for you!!!

The core code is as follows: var segments = new List(); var encoderBegins = new List();
string ModelFilePath = "ggml-base.bin"; string txtFilePath = "mqfwn-f344o.wav";

            using var factory = WhisperFactory.FromPath(ModelFilePath);

            using var processor = factory.CreateBuilder()
                            .WithLanguage("zh")
                            .WithEncoderBeginHandler((e) =>
                            {
                                encoderBegins.Add(e);
                                return true;
                            })
                            .WithSegmentEventHandler(segments.Add)
                            .Build();

            using var fileReader = File.OpenRead(txtFilePath);
            await foreach (var result in processor.ProcessAsync(fileReader))
            {
                Console.WriteLine("语音转文字转换成功: " + $"{result.Start}->{result.End}: {result.Text}");

            }

        }
sandrohanea commented 11 months ago

Hello, I'm not sure neither I tested if this works, but based on the original whisper discussion, prompting might help in this scenario: https://github.com/openai/whisper/discussions/277

If you can test it, and update here if that works, it would be great.

777sfdf commented 10 months ago

Thank you very much for your answer. I'm sorry for the delay in replying to you. I have already read this issue, but due to my lack of proficiency in. net, what puzzled me was how to use this command in a project - initial_ Prompt, so if you could have time to revisit this question and provide an answer, I would greatly appreciate it. Thank you

sandrohanea commented 10 months ago

In order to use the initial prompt, you will call WithPrompt method on the whisperBuilder: https://github.com/sandrohanea/whisper.net/blob/454ad43043e3b5cd920e5e3a1cb309861c21d158/Whisper.net/WhisperProcessorBuilder.cs#L286C36-L286C46

777sfdf commented 10 months ago

Okay, thank you again for your prompt response. I will provide feedback on the results after the test

sandrohanea commented 10 months ago

Hey @777sfdf ,

Any news about the prompt effectiveness for your use-case?

777sfdf commented 9 months ago

I'm very sorry for not replying to your reply in a timely manner. The main reason is that during the testing a while ago, the results were not very good. Today, I conducted the testing again and finally achieved good results. The audio escaped text is already Simplified Chinese

The following code has been added with only one additional method, WithPrompt. The rest, including the model and audio, have not changed image

The rendering is as follows image

In addition, I would also like to inquire about the accuracy ranking of the four models included in the ggml model. Is it in the order of ting<base<small<medium? I hope to receive your reply. Thank you

sandrohanea commented 9 months ago

Awesome, I'm glat to here that "WithPrompt" is working as expected in the given scenario.

Also, as a suggestion, if you already know the language, instead of auto you can use zh so that detection of the language won't take place and you'll get transcripts faster.

About the model sizes, they are: tiny < base < small < medium < large