linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
1.87k stars 149 forks source link

Use silero v3.1 #142

Closed Jeronymous closed 9 months ago

Jeronymous commented 10 months ago

This seems to improve VAD. See problems spotted in https://github.com/linto-ai/whisper-timestamped/issues/74

In the figure below:

Notes:

Jeronymous commented 9 months ago

In the end I implemented the choice of the VAD method. Default remains the same (silero latest / 4.0), but former versions of silero can be specified (e.g. "silero:3.1"). And also "auditok" can be used. See this PR that I'm gonna close: https://github.com/linto-ai/whisper-timestamped/pull/78