The problem of multilingual mixing in recording

The library is identifying one language at stat-up (if "auto" is used) and then it is used to transcribe the entire file so it makes sense it will uniformly output English (and transcriptions when other languages are spoken).

One idea to fix it (but it is not tested) would be to :

Add the WithProbabilities on the builder => which will give you the confidence level for each segment.
Once you identify some segment with low confidence level, re-transcribe it (by extracting the frames for that segment from start and end time). Either provide the other language if you know it is always "Chinese and English" or identify it again using "auto".
Replace the segments in the result.

It would be probably interesting to have this functionality in the library in the future, but cannot promise that I'll have time to implement it.

sandrohanea / whisper.net

The problem of multilingual mixing in recording #177