Open wjkoh opened 1 month ago
Hi @streamer45, thanks for your awesome package! I found discrepancies between silero-vad-go and the Python package. My input file is a 13-minute-long speech of JFK, and silero-vad-go misses multiple segments that the Python one can detect.
The input file is output.wav.zip.
The following results are from silero-vad-go:
2024/10/20 18:00:25 speech starts at 4.80s 2024/10/20 18:00:25 speech ends at 9.53s 2024/10/20 18:00:25 speech starts at 10.43s 2024/10/20 18:00:25 speech ends at 12.00s 2024/10/20 18:00:25 speech starts at 12.29s 2024/10/20 18:00:25 speech ends at 14.65s 2024/10/20 18:00:25 speech starts at 15.01s 2024/10/20 18:00:25 speech ends at 17.53s 2024/10/20 18:00:25 speech starts at 196.74s 2024/10/20 18:00:25 speech ends at 196.93s 2024/10/20 18:00:25 speech starts at 209.86s 2024/10/20 18:00:25 speech ends at 210.11s 2024/10/20 18:00:25 speech starts at 214.18s ...
The following results are from silero-vad (Python):
[{'start': 4.3, 'end': 9.6}, {'start': 10.1, 'end': 14.6}, {'start': 15.0, 'end': 17.5}, {'start': 18.2, 'end': 18.6}, {'start': 31.8, 'end': 33.4}, {'start': 35.0, 'end': 36.6}, {'start': 37.1, 'end': 39.6}, {'start': 59.7, 'end': 60.3}, {'start': 61.3, 'end': 62.3}, {'start': 64.0, 'end': 64.5}, {'start': 64.8, 'end': 65.4}, {'start': 65.7, 'end': 66.8}, {'start': 67.5, 'end': 69.1}, {'start': 70.4, 'end': 72.1}, {'start': 73.0, 'end': 74.2}, {'start': 74.5, 'end': 76.3}, {'start': 77.1, 'end': 78.0}, {'start': 83.5, 'end': 85.2}, {'start': 86.2, 'end': 87.6}, {'start': 88.4, 'end': 89.8}, {'start': 90.6, 'end': 92.3}, {'start': 92.6, 'end': 94.6}, {'start': 95.1, 'end': 96.0}, {'start': 97.1, 'end': 99.0}, {'start': 99.9, 'end': 101.1}, {'start': 102.0, 'end': 103.4}, {'start': 103.7, 'end': 107.0}, {'start': 107.9, 'end': 108.8}, {'start': 109.2, 'end': 110.9}, {'start': 111.5, 'end': 112.3}, {'start': 112.7, 'end': 113.7}, {'start': 114.9, 'end': 116.7}, {'start': 117.2, 'end': 118.4}, {'start': 119.1, 'end': 121.6}, {'start': 122.0, 'end': 124.0}, {'start': 124.4, 'end': 125.5}, {'start': 135.2, 'end': 136.3}, {'start': 137.6, 'end': 138.6}, {'start': 138.9, 'end': 139.3}, {'start': 140.2, 'end': 140.6}, {'start': 141.1, 'end': 142.4}, {'start': 143.5, 'end': 144.9}, {'start': 145.9, 'end': 146.8}, {'start': 147.7, 'end': 148.6}, {'start': 149.2, 'end': 150.5}, {'start': 152.4, 'end': 153.7}, {'start': 154.8, 'end': 155.9}, {'start': 157.0, 'end': 158.7}, {'start': 158.8, 'end': 160.2}, {'start': 161.0, 'end': 166.2}, {'start': 178.3, 'end': 179.7}, {'start': 181.0, 'end': 182.4}, {'start': 182.8, 'end': 185.2}, {'start': 185.7, 'end': 186.7}, {'start': 187.6, 'end': 189.5}, {'start': 189.7, 'end': 190.1}, {'start': 190.6, 'end': 191.3}, ...
As you can see, silero-vad-go couldn't detect any speech from 17.53s to 196.74s, but silero-vad (Python) found many segments in the time range. I used the .onnx file at https://github.com/snakers4/silero-vad/blob/fd41da0b1544982e7369c3eb4a64c64be58cfef0/src/silero_vad/data/silero_vad.onnx.
@wjkoh Weird, this is the output I am getting with using the provided file:
Silero model is the same. Using onnxruntime-linux-x64-1.18.1.
onnxruntime-linux-x64-1.18.1
Hi @streamer45, thanks for your awesome package! I found discrepancies between silero-vad-go and the Python package. My input file is a 13-minute-long speech of JFK, and silero-vad-go misses multiple segments that the Python one can detect.
The input file is output.wav.zip.
The following results are from silero-vad-go:
The following results are from silero-vad (Python):
As you can see, silero-vad-go couldn't detect any speech from 17.53s to 196.74s, but silero-vad (Python) found many segments in the time range. I used the .onnx file at https://github.com/snakers4/silero-vad/blob/fd41da0b1544982e7369c3eb4a64c64be58cfef0/src/silero_vad/data/silero_vad.onnx.