Closed kamranjon closed 1 year ago
Thank you @kamranjon for opening this issue. Indeed there was a bug with "efficient decoding" when the language was detected automatically. This is fixed now.
I was not testing thoroughly with languages like Japanese, and now I added tests, to avoid problems in the future.
About the difference between efficient and naive :
The implementation of the efficient mode is much more tricky, so more prone to bugs (but I would say that it's quite stable now, hoping that you detected the last remaining issue).
Japanese is a good example, here is a single word output:
Many words are combined together. Here is an example audio to test with:
https://user-images.githubusercontent.com/3966239/219478733-ad14e548-8895-4995-9f81-02b761293a61.mp4
_transcribe_timestamped_efficient()
but does work well with_transcribe_timestamped_naive()
- based on logging insideshould_use_space()
it seems switching to naive fixes the issue (when using efficent, the language is detected asen
and subsequently the incorrect spacing var is used). Could you explain the difference between the two (efficient/naive)?