juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
903 stars 76 forks source link

WeSpeaker model load failed! #223

Closed SheenChi closed 6 months ago

SheenChi commented 7 months ago

hello! I want to test wespeaker model performance in the diart stream.py, I set the params like: --embedding pretrained/wespeaker/voxceleb_resnet34_LM.onnx, but when run this demo, error happens with message: _pickle.UnpicklingError: invalid load key, '\x08'.

I found this problem is caused by PyannoteLoader _call_ func can not load onnx model from Model.from_pretrained and not catch UnpicklingError exception, maybe the _call_ func should like this:

    def __call__(self) -> Callable:
        try:
            model = Model.from_pretrained(self.model_info, use_auth_token=self.hf_token)
            specs = getattr(model, "specifications", None)
            if specs is not None and specs.powerset:
                model = PowersetAdapter(model)
            return model
        except HTTPError:
            pass
        except ModuleNotFoundError:
            pass
        except UnpicklingError:
            pass
        return PretrainedSpeakerEmbedding(self.model_info, use_auth_token=self.hf_token)
juanmc2005 commented 7 months ago

Hi @SheenChi, are you able to load the model using PretrainedSpeakerEmbedding? I don't think pyannote can load arbitrary onnx models. In that case, you may want to implement a custom loader (as shown here) to use WeSpeakerPretrainedSpeakerEmbedding (see here).

Alternatively, you could export a custom version of your model to ONNX that can receive both the waveform and speaker weights to give per-speaker embeddings. That way you could simply use EmbeddingModel.from_pretrained or from_onnx with your new onnx file.

SheenChi commented 6 months ago

ok, thank you for your response, i will try it