snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
4.45k stars 435 forks source link

Bug report - Unable to convert model to CoreML or to C #450

Closed ephemer closed 6 months ago

ephemer commented 6 months ago

🐛 Bug

It's not possible to convert the Silero VAD model to work with CoreML or with other conversion tools.

To Reproduce

Steps to reproduce the behavior:

Create the following script in the root directory of this repo and pip install coremltools. Then run:

import torch
import utils_vad # from this repo
import coremltools as ct
import numpy as np

model = utils_vad.init_jit_model("files/silero_vad.jit")
model.eval()

input_features = [
    ct.TensorType(name="audio", shape=torch.Size([512])),
    ct.TensorType(name="sampling_rate", shape=ct.Shape((1,)), dtype=np.int64),
]
output_features = [ct.TensorType(name="output")]

coreml_model = ct.convert(
    model,
    inputs=input_features,
    outputs=output_features,
    minimum_deployment_target=ct.target.iOS15,
    skip_model_load=True,
)

There are too many errors to list. I have tried to go through and comment out places where Exceptions are raised to try to get to the bottom of it, but I wasn't able to get even a broken output:

Expected behavior

What I'm really trying to do is find a way to include silero-vad in a mobile app without having to bundle ONNX. I wasn't able to convert the .jit model to onnx myself either (I thought maybe I'd have more luck converting the resulting model to another format if it worked). I also attempted to use this tool to convert the onnx model to C but it also fails because If is not implemented there.

Environment

python collect_env.py
Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.4.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: version 3.28.3
Libc version: N/A

Python version: 3.12.3 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 11:44:52) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2 Max

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.0 # it fails just the same with 2.0.0 and 2.1.0 though
[pip3] torchaudio==2.3.0
[pip3] torchvision==0.18.0
[conda] numpy                     1.26.4          py312h7f4fdc5_0
[conda] numpy-base                1.26.4          py312he047099_0
[conda] pytorch                   2.3.0                  py3.12_0    pytorch
[conda] torchaudio                2.3.0                 py312_cpu    pytorch
[conda] torchvision               0.18.0                py312_cpu    pytorch

Additional context

It would be really helpful to be able to modify the original Silero PyTorch model, for example to remove branching, implement the feature extractor in C directly, and so on. I'm curious whether you have considered that possibility for distribution of upcoming versions?

IntendedConsequence commented 6 months ago

@ephemer I just finished writing a C implementation of the v3.1 16kHz model. I'm working on it as a personal learning project, and it's very much in proof of concept stage. It probably won't build nor run anywhere but my machine atm. Having said that, if you don't mind the jank, look in my vadc repo, branch c_port_continued, this function tests the full model implementation https://github.com/IntendedConsequence/vadc/blob/b5c25db328a5fbee27a421a2d892de42bbaa3dd5/test.c#L1424

ephemer commented 6 months ago

@IntendedConsequence thanks for sharing, that's really interesting work! 🙏🏼