Closed 1121170088 closed 1 week ago
Has the same issue. It look like network structure changed ?
5.0 version
4.0 version
No, The current examples in https://github.com/snakers4/silero-vad/tree/master/examples won't work with silero vad v5 as of today (2024.06.29)
I suggest that you have a look at https://github.com/k2-fsa/sherpa-onnx/pull/1064
It supports both silero vad version 4 and 5.
It provides APIs for 10 different programming languages, e.g.,
It also supports running silero VAD with Android, iOS, Flutter, NodeJS, etc.
When I attempt to run inference with the old model, it's running fine like this:
output, h, c = session.run(['output', 'hn', 'cn'], {input_name: input_tensor, sr_name: np.array([sample_rate], dtype=np.int64), h_name: h, c_name: c})
With the new model i would assume it's this way:
output, s_n = session.run(['output', 'stateN'], {input_name: input_tensor, sr_name: np.array([sample_rate], dtype=np.float32), state_n: stateN})
But I get an error -> input: state Got: 1 Expected: 3 Please fix either the inputs/outputs or the model.
I do send 3 inputs with input_name, sr_name and state_n... and hard coded the outputs from the model
also I tried reshaping the stateN = s_n.reshape((2, 1, -1))
but it's the same.
What am I missing here?
what is the shape of input_tensor
and stateN
? @filtercodes
Thanks for the reply,
I created input_tensor from audio buffer that has been converted to float32 previously using int2float() from cpp example.
input_tensor = np.expand_dims(audio_float32, axis=0)
it's an audio buffer of 1024 samples.
and
stateN = np.zeros((2, 1, 128), dtype=np.float32)
i could't make it work, maybe i maked some mistakes that i don't realize.