w-okada / voice-changer

リアルタイムボイスチェンジャー Realtime Voice Changer
16.01k stars 1.74k forks source link

[ISSUE] Model requires 6 inputs. Input Feed contains 5 #1345

Closed samolego closed 1 week ago

samolego commented 1 week ago

Voice Changer Version

v. docker

Operational System

Fedora 40



CUDA Version


Read carefully and check the options

Does pre-installed model work?


Model Type

RVC v2

Issue Description

RVC v2 ONNX inference seems to be broken. The ONNX models expect 6 inputs, but only 5 are given. Also, I think v2 has different names than v1 for the inputs?

Here's this repo's code sample for input:

            audio1 = self.model.run(
                    "feats": feats.cpu().numpy().astype(np.float32),
                    "p_len": pitch_length.cpu().numpy().astype(np.int64),
                    "pitch": pitch.cpu().numpy().astype(np.int64),
                    "pitchf": pitchf.cpu().numpy().astype(np.float32),
                    "sid": sid.cpu().numpy().astype(np.int64)

However, if you look at the code from RVC-Webui:

        onnx_input = {
            self.model.get_inputs()[0].name: hubert,
            self.model.get_inputs()[1].name: hubert_length,
            self.model.get_inputs()[2].name: pitch,
            self.model.get_inputs()[3].name: pitchf,
            self.model.get_inputs()[4].name: ds,
            self.model.get_inputs()[5].name: rnd,
        return (self.model.run(None, onnx_input)[0] * 32767).astype(np.int16)

Both the output_names and input_feed params differ. Also the names of the inputs differ: image

sadly, I also cannot get the rvc-webui to work with onnx files.

Application Screenshot

No response

Logs on console

mport sys; print('Python %s on %s' % (sys.version, sys.platform)) /home/samoh/Documents/work/fax/voicechanger/.venv/bin/python -X pycache_prefix=/home/samoh/.cache/JetBrains/PyCharmCE2024.2/cpython-cache /home/samoh/.local/share/JetBrains/Toolbox/apps/pycharm-community/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client --port 40503 --file /home/samoh/Documents/work/fax/voicechanger/voice-changer/server/MMVCServerSIO.py Connected to pydev debugger (build 242.21829.153) [INFO] Booting PHASE :main [INFO] PYTHON:3.10.14 (main, Apr 17 2024, 00:00:00) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)] [INFO] Activating the Voice Changer. [INFO] [Voice Changer] download sample catalog. samples_0004_t.json [INFO] [Voice Changer] download sample catalog. samples_0004_o.json [INFO] [Voice Changer] download sample catalog. samples_0004_d.json [INFO] [Voice Changer] model_dir is already exists. skip download samples. [INFO] Internal_Port:18888 [INFO] protocol: HTTP [INFO] -- ---- -- [INFO] Please open the following URL in your browser. [INFO] http://:/ [INFO] In many cases, it will launch when you access any of the following URLs. [INFO] [INFO] Booting PHASE :__mp_main__ [INFO] The server process is starting up. [INFO] Booting PHASE :MMVCServerSIO [INFO] [Voice Changer] VoiceChangerManager initializing... [INFO] [Voice Changer] model slot is changed -1 -> 4 [INFO] ................RVC [INFO] [Voice Changer] [RVCr2] Creating instance [INFO] VoiceChangerV2 Initialized (GPU_NUM(cuda):0, mps_enabled:False, onnx_device:GPU) [INFO] [Voice Changer][RVC]: update_settings gpu:0 [INFO] [Voice Changer][RVCr2] Initializing... [INFO] [Voice Changer][RVCr2] Creating pipeline with params: {'model_dir': 'model_dir', 'content_vec_500': 'pretrain/checkpoint_best_legacy_500.pt', 'content_vec_500_onnx': 'pretrain/content_vec_500.onnx', 'content_vec_500_onnx_on': True, 'hubert_base': 'pretrain/hubert_base.pt', 'hubert_base_jp': 'pretrain/rinna_hubert_base_jp.pt', 'hubert_soft': 'pretrain/hubert/hubert-soft-0d54a1f4.pt', 'nsf_hifigan': 'pretrain/nsf_hifigan/model', 'sample_mode': 'production', 'crepe_onnx_full': 'pretrain/crepe_onnx_full.onnx', 'crepe_onnx_tiny': 'pretrain/crepe_onnx_tiny.onnx', 'rmvpe': 'pretrain/rmvpe.pt', 'rmvpe_onnx': 'pretrain/rmvpe.onnx', 'whisper_tiny': 'pretrain/whisper_tiny.pt'} [INFO] [Voice Changer][RVCr2] Creating pipeline with slotInfo: {'slotIndex': 4, 'voiceChangerType': 'RVC', 'name': 'otrok-vid', 'description': '', 'credit': '', 'termsOfUseUrl': '', 'iconFile': '', 'speakers': {'0': 'target'}, 'modelFile': 'otrok-vid.pth', 'indexFile': 'added_IVF1882_Flat_nprobe_1_otrok-vid_v2.index', 'defaultTune': 12, 'defaultIndexRatio': 0, 'defaultProtect': 0.5, 'isONNX': False, 'modelType': 'pyTorchRVCv2', 'samplingRate': 40000, 'f0': True, 'embChannels': 768, 'embOutputLayer': 12, 'useFinalProj': False, 'deprecated': False, 'embedder': 'hubert_base', 'sampleId': '', 'version': 'v2'} [INFO] [Voice Changer][RVCr2] Creating pipeline with settings: {'gpu': 0, 'dstId': 0, 'f0Detector': 'rmvpe_onnx', 'tran': 12, 'silentThreshold': 1e-05, 'extraConvertSize': 4096, 'indexRatio': 0, 'protect': 0.5, 'rvcQuality': 0, 'silenceFront': 1, 'modelSamplingRate': 48000, 'speakers': {}} gin_channels: 256 self.spk_embed_dim: 109 [Voice Changer] generate new embedder. (no embedder) [Voice Changer] use torch contentvec Not implemented [Voice Changer] Loading index... Try loading... model_dir/4/added_IVF1882_Flat_nprobe_1_otrok-vid_v2.index [INFO] GENERATE INFERENCER<voice_changer.RVC.inferencer.RVCInferencerv2.RVCInferencerv2 object at 0x7fcef714f3a0> [INFO] GENERATE EMBEDDER<voice_changer.RVC.embedder.FairseqHubert.FairseqHubert object at 0x7fcef6b930d0> [INFO] GENERATE PITCH EXTRACTOR<voice_changer.RVC.pitchExtractor.RMVPEOnnxPitchExtractor.RMVPEOnnxPitchExtractor object at 0x7fcef6b93250> [INFO] [Voice Changer] [RVC] Initializing... done [INFO] [Voice Changer][RVC]: update_settings f0Detector:crepe_tiny [INFO] [Voice Changer][RVC]: update_settings serverReadChunkSize:192 [INFO] [Voice Changer][RVC]: update_settings extraConvertSize:4096 [INFO] [Voice Changer][RVC]: update_settings modelSlotIndex:1725006104004 [INFO] [Voice Changer] VoiceChangerManager initializing... done. [INFO] [Voice Changer] MMVC_Rest initializing... [INFO] [Voice Changer] MMVC_Rest initializing... done. [INFO] [Voice Changer] MMVC_SocketIOApp initializing... [INFO] [Voice Changer] MMVC_SocketIOApp initializing... done. [2024-09-12 09:02:35] connet sid : rbPHYT2dML_3cKkPAAAB [2024-09-12 09:02:35] connet sid : 0aNkvDIAvg1pywHOAAAD paramDict {'voiceChangerType': 'RVC', 'slot': 5, 'isSampleMode': False, 'sampleId': None, 'files': [{'name': 'otrok-vid.onnx', 'kind': 'rvcModel', 'dir': ''}, {'name': 'added_otrok-vid.index', 'kind': 'rvcIndex', 'dir': ''}], 'params': {}} [INFO] FILE: LoadModelParamFile(name='otrok-vid.onnx', kind='rvcModel', dir='') [INFO] move to upload_dir/otrok-vid.onnx -> model_dir/5/otrok-vid.onnx [INFO] FILE: LoadModelParamFile(name='added_otrok-vid.index', kind='rvcIndex', dir='') [INFO] move to upload_dir/added_otrok-vid.index -> model_dir/5/added_otrok-vid.index RVC:: slotInfo.modelFile otrok-vid.onnx [Voice Changer] setInfoByONNX 'metadata' [Voice Changer] ############## !!!! CAUTION !!!! #################### [Voice Changer] This onnxfie is depricated. Please regenerate onnxfile. [Voice Changer] ############## !!!! CAUTION !!!! #################### SlotInfo::: RVCModelSlot(slotIndex=-1, voiceChangerType='RVC', name='otrok-vid', description='', credit='', termsOfUseUrl='', iconFile='', speakers={0: 'target'}, modelFile='otrok-vid.onnx', indexFile='added_otrok-vid.index', defaultTune=0, defaultIndexRatio=0, defaultProtect=0.5, isONNX=True, modelType='onnxRVC', samplingRate=48000, f0=True, embChannels=256, embOutputLayer=9, useFinalProj=True, deprecated=True, embedder='hubert_base', sampleId='', version='v2') [INFO] params, LoadModelParams(voiceChangerType='RVC', slot=5, isSampleMode=False, sampleId=None, files=[LoadModelParamFile(name='otrok-vid.onnx', kind='rvcModel', dir=''), LoadModelParamFile(name='added_otrok-vid.index', kind='rvcIndex', dir='')], params={}) [INFO] [Voice Changer] UPDATE MODEL INFO, {"slot":5,"key":"speakers","val":"{\"0\":\"Otrok ONNX\"}"} SlotInfo::: RVCModelSlot(slotIndex=5, voiceChangerType='RVC', name='otrok-vid', description='', credit='', termsOfUseUrl='', iconFile='', speakers={'0': 'Otrok ONNX'}, modelFile='otrok-vid.onnx', indexFile='added_otrok-vid.index', defaultTune=0, defaultIndexRatio=0, defaultProtect=0.5, isONNX=True, modelType='onnxRVC', samplingRate=48000, f0=True, embChannels=256, embOutputLayer=9, useFinalProj=True, deprecated=True, embedder='hubert_base', sampleId='', version='v2') [INFO] [Voice Changer] UPDATE MODEL INFO, {"slot":5,"key":"speakers","val":"{\"0\":\"Otrok ONNX\"}"} SlotInfo::: RVCModelSlot(slotIndex=5, voiceChangerType='RVC', name='otrok-vid', description='', credit='', termsOfUseUrl='', iconFile='', speakers={'0': 'Otrok ONNX'}, modelFile='otrok-vid.onnx', indexFile='added_otrok-vid.index', defaultTune=0, defaultIndexRatio=0, defaultProtect=0.5, isONNX=True, modelType='onnxRVC', samplingRate=48000, f0=True, embChannels=256, embOutputLayer=9, useFinalProj=True, deprecated=True, embedder='hubert_base', sampleId='', version='v2') [INFO] [Voice Changer] UPDATE MODEL INFO, {"slot":5,"key":"speakers","val":"{\"0\":\"Otrok ONNX\"}"} SlotInfo::: RVCModelSlot(slotIndex=5, voiceChangerType='RVC', name='otrok-vid', description='', credit='', termsOfUseUrl='', iconFile='', speakers={'0': 'Otrok ONNX'}, modelFile='otrok-vid.onnx', indexFile='added_otrok-vid.index', defaultTune=0, defaultIndexRatio=0, defaultProtect=0.5, isONNX=True, modelType='onnxRVC', samplingRate=48000, f0=True, embChannels=256, embOutputLayer=9, useFinalProj=True, deprecated=True, embedder='hubert_base', sampleId='', version='v2') [Voice Changer] update configuration: modelSlotIndex 1726124644005 [INFO] [Voice Changer] model slot is changed 4 -> 5 [INFO] ................RVC [INFO] [Voice Changer] [RVCr2] Creating instance [INFO] VoiceChangerV2 Initialized (GPU_NUM(cuda):0, mps_enabled:False, onnx_device:GPU) Pipeline has been deleted [INFO] [Voice Changer][RVC]: update_settings gpu:0 [INFO] [Voice Changer][RVCr2] Initializing... [INFO] [Voice Changer][RVCr2] Creating pipeline with params: {'model_dir': 'model_dir', 'content_vec_500': 'pretrain/checkpoint_best_legacy_500.pt', 'content_vec_500_onnx': 'pretrain/content_vec_500.onnx', 'content_vec_500_onnx_on': True, 'hubert_base': 'pretrain/hubert_base.pt', 'hubert_base_jp': 'pretrain/rinna_hubert_base_jp.pt', 'hubert_soft': 'pretrain/hubert/hubert-soft-0d54a1f4.pt', 'nsf_hifigan': 'pretrain/nsf_hifigan/model', 'sample_mode': 'production', 'crepe_onnx_full': 'pretrain/crepe_onnx_full.onnx', 'crepe_onnx_tiny': 'pretrain/crepe_onnx_tiny.onnx', 'rmvpe': 'pretrain/rmvpe.pt', 'rmvpe_onnx': 'pretrain/rmvpe.onnx', 'whisper_tiny': 'pretrain/whisper_tiny.pt'} [INFO] [Voice Changer][RVCr2] Creating pipeline with slotInfo: {'slotIndex': 5, 'voiceChangerType': 'RVC', 'name': 'otrok-vid', 'description': '', 'credit': '', 'termsOfUseUrl': '', 'iconFile': '', 'speakers': {'0': 'Otrok ONNX'}, 'modelFile': 'otrok-vid.onnx', 'indexFile': 'added_otrok-vid.index', 'defaultTune': 0, 'defaultIndexRatio': 0, 'defaultProtect': 0.5, 'isONNX': True, 'modelType': 'onnxRVC', 'samplingRate': 48000, 'f0': True, 'embChannels': 256, 'embOutputLayer': 9, 'useFinalProj': True, 'deprecated': True, 'embedder': 'hubert_base', 'sampleId': '', 'version': 'v2'} [INFO] [Voice Changer][RVCr2] Creating pipeline with settings: {'gpu': 0, 'dstId': 0, 'f0Detector': 'rmvpe_onnx', 'tran': 12, 'silentThreshold': 1e-05, 'extraConvertSize': 4096, 'indexRatio': 0, 'protect': 0.5, 'rvcQuality': 0, 'silenceFront': 1, 'modelSamplingRate': 48000, 'speakers': {}} [Voice Changer] generate new embedder. (anyway) [Voice Changer] use torch contentvec Not implemented [Voice Changer] Loading index... Try loading... model_dir/5/added_otrok-vid.index [Voice Changer] update configuration: modelSlotIndex 1726124646005 [INFO] [Voice Changer] model slot is changed 4 -> 5 [INFO] ................RVC [INFO] [Voice Changer] [RVCr2] Creating instance [INFO] VoiceChangerV2 Initialized (GPU_NUM(cuda):0, mps_enabled:False, onnx_device:GPU) [INFO] [Voice Changer][RVC]: update_settings gpu:0 [INFO] [Voice Changer][RVCr2] Initializing... [INFO] [Voice Changer][RVCr2] Creating pipeline with params: {'model_dir': 'model_dir', 'content_vec_500': 'pretrain/checkpoint_best_legacy_500.pt', 'content_vec_500_onnx': 'pretrain/content_vec_500.onnx', 'content_vec_500_onnx_on': True, 'hubert_base': 'pretrain/hubert_base.pt', 'hubert_base_jp': 'pretrain/rinna_hubert_base_jp.pt', 'hubert_soft': 'pretrain/hubert/hubert-soft-0d54a1f4.pt', 'nsf_hifigan': 'pretrain/nsf_hifigan/model', 'sample_mode': 'production', 'crepe_onnx_full': 'pretrain/crepe_onnx_full.onnx', 'crepe_onnx_tiny': 'pretrain/crepe_onnx_tiny.onnx', 'rmvpe': 'pretrain/rmvpe.pt', 'rmvpe_onnx': 'pretrain/rmvpe.onnx', 'whisper_tiny': 'pretrain/whisper_tiny.pt'} [INFO] [Voice Changer][RVCr2] Creating pipeline with slotInfo: {'slotIndex': 5, 'voiceChangerType': 'RVC', 'name': 'otrok-vid', 'description': '', 'credit': '', 'termsOfUseUrl': '', 'iconFile': '', 'speakers': {'0': 'Otrok ONNX'}, 'modelFile': 'otrok-vid.onnx', 'indexFile': 'added_otrok-vid.index', 'defaultTune': 0, 'defaultIndexRatio': 0, 'defaultProtect': 0.5, 'isONNX': True, 'modelType': 'onnxRVC', 'samplingRate': 48000, 'f0': True, 'embChannels': 256, 'embOutputLayer': 9, 'useFinalProj': True, 'deprecated': True, 'embedder': 'hubert_base', 'sampleId': '', 'version': 'v2'} [INFO] [Voice Changer][RVCr2] Creating pipeline with settings: {'gpu': 0, 'dstId': 0, 'f0Detector': 'rmvpe_onnx', 'tran': 12, 'silentThreshold': 1e-05, 'extraConvertSize': 4096, 'indexRatio': 0, 'protect': 0.5, 'rvcQuality': 0, 'silenceFront': 1, 'modelSamplingRate': 48000, 'speakers': {}} [INFO] GENERATE INFERENCER<voice_changer.RVC.inferencer.OnnxRVCInferencer.OnnxRVCInferencer object at 0x7fcefcf00220> [INFO] GENERATE EMBEDDER<voice_changer.RVC.embedder.FairseqHubert.FairseqHubert object at 0x7fcef6b926e0> [Voice Changer] generate new embedder. (anyway) [Voice Changer] use torch contentvec Not implemented [INFO] GENERATE PITCH EXTRACTOR<voice_changer.RVC.pitchExtractor.RMVPEOnnxPitchExtractor.RMVPEOnnxPitchExtractor object at 0x7fcef71bcc10> [INFO] [Voice Changer] [RVC] Initializing... done Pipeline has been deleted [INFO] [Voice Changer][RVC]: update_settings f0Detector:crepe_tiny [INFO] [Voice Changer][RVC]: update_settings serverReadChunkSize:192 [INFO] [Voice Changer][RVC]: update_settings extraConvertSize:4096 [INFO] [Voice Changer][RVC]: update_settings modelSlotIndex:1726124644005 [Voice Changer] Loading index... Try loading... model_dir/5/added_otrok-vid.index [INFO] GENERATE INFERENCER<voice_changer.RVC.inferencer.OnnxRVCInferencer.OnnxRVCInferencer object at 0x7fcef6e227a0> [INFO] GENERATE EMBEDDER<voice_changer.RVC.embedder.FairseqHubert.FairseqHubert object at 0x7fcef6e22b30> [INFO] GENERATE PITCH EXTRACTOR<voice_changer.RVC.pitchExtractor.RMVPEOnnxPitchExtractor.RMVPEOnnxPitchExtractor object at 0x7fcef9c4b9a0> [INFO] [Voice Changer] [RVC] Initializing... done [INFO] [Voice Changer][RVC]: update_settings f0Detector:crepe_tiny [INFO] [Voice Changer][RVC]: update_settings serverReadChunkSize:192 [INFO] [Voice Changer][RVC]: update_settings extraConvertSize:4096 [INFO] [Voice Changer][RVC]: update_settings modelSlotIndex:1726124646005 [Voice Changer] update configuration: modelSlotIndex 1726124653004 [INFO] [Voice Changer] model slot is changed 5 -> 4 [INFO] ................RVC [INFO] [Voice Changer] [RVCr2] Creating instance [INFO] VoiceChangerV2 Initialized (GPU_NUM(cuda):0, mps_enabled:False, onnx_device:GPU) Pipeline has been deleted [INFO] [Voice Changer][RVC]: update_settings gpu:0 [INFO] [Voice Changer][RVCr2] Initializing... [INFO] [Voice Changer][RVCr2] Creating pipeline with params: {'model_dir': 'model_dir', 'content_vec_500': 'pretrain/checkpoint_best_legacy_500.pt', 'content_vec_500_onnx': 'pretrain/content_vec_500.onnx', 'content_vec_500_onnx_on': True, 'hubert_base': 'pretrain/hubert_base.pt', 'hubert_base_jp': 'pretrain/rinna_hubert_base_jp.pt', 'hubert_soft': 'pretrain/hubert/hubert-soft-0d54a1f4.pt', 'nsf_hifigan': 'pretrain/nsf_hifigan/model', 'sample_mode': 'production', 'crepe_onnx_full': 'pretrain/crepe_onnx_full.onnx', 'crepe_onnx_tiny': 'pretrain/crepe_onnx_tiny.onnx', 'rmvpe': 'pretrain/rmvpe.pt', 'rmvpe_onnx': 'pretrain/rmvpe.onnx', 'whisper_tiny': 'pretrain/whisper_tiny.pt'} [INFO] [Voice Changer][RVCr2] Creating pipeline with slotInfo: {'slotIndex': 4, 'voiceChangerType': 'RVC', 'name': 'otrok-vid', 'description': '', 'credit': '', 'termsOfUseUrl': '', 'iconFile': '', 'speakers': {'0': 'target'}, 'modelFile': 'otrok-vid.pth', 'indexFile': 'added_IVF1882_Flat_nprobe_1_otrok-vid_v2.index', 'defaultTune': 12, 'defaultIndexRatio': 0, 'defaultProtect': 0.5, 'isONNX': False, 'modelType': 'pyTorchRVCv2', 'samplingRate': 40000, 'f0': True, 'embChannels': 768, 'embOutputLayer': 12, 'useFinalProj': False, 'deprecated': False, 'embedder': 'hubert_base', 'sampleId': '', 'version': 'v2'} [INFO] [Voice Changer][RVCr2] Creating pipeline with settings: {'gpu': 0, 'dstId': 0, 'f0Detector': 'rmvpe_onnx', 'tran': 12, 'silentThreshold': 1e-05, 'extraConvertSize': 4096, 'indexRatio': 0, 'protect': 0.5, 'rvcQuality': 0, 'silenceFront': 1, 'modelSamplingRate': 48000, 'speakers': {}} gin_channels: 256 self.spk_embed_dim: 109 [Voice Changer] generate new embedder. (anyway) [Voice Changer] use torch contentvec Not implemented [Voice Changer] Loading index... Try loading... model_dir/4/added_IVF1882_Flat_nprobe_1_otrok-vid_v2.index [INFO] GENERATE INFERENCER<voice_changer.RVC.inferencer.RVCInferencerv2.RVCInferencerv2 object at 0x7fcef71bd000> [INFO] GENERATE EMBEDDER<voice_changer.RVC.embedder.FairseqHubert.FairseqHubert object at 0x7fcef6e229e0> [INFO] GENERATE PITCH EXTRACTOR<voice_changer.RVC.pitchExtractor.RMVPEOnnxPitchExtractor.RMVPEOnnxPitchExtractor object at 0x7fcefd0e0a90> [INFO] [Voice Changer] [RVC] Initializing... done [INFO] [Voice Changer][RVC]: update_settings f0Detector:crepe_tiny [INFO] [Voice Changer][RVC]: update_settings serverReadChunkSize:192 [INFO] [Voice Changer][RVC]: update_settings extraConvertSize:4096 [INFO] [Voice Changer][RVC]: update_settings modelSlotIndex:1726124653004 [Voice Changer] update configuration: modelSlotIndex 1726124657005 [INFO] [Voice Changer] model slot is changed 4 -> 5 [INFO] ................RVC [INFO] [Voice Changer] [RVCr2] Creating instance [INFO] VoiceChangerV2 Initialized (GPU_NUM(cuda):0, mps_enabled:False, onnx_device:GPU) Pipeline has been deleted [INFO] [Voice Changer][RVC]: update_settings gpu:0 [INFO] [Voice Changer][RVCr2] Initializing... [INFO] [Voice Changer][RVCr2] Creating pipeline with params: {'model_dir': 'model_dir', 'content_vec_500': 'pretrain/checkpoint_best_legacy_500.pt', 'content_vec_500_onnx': 'pretrain/content_vec_500.onnx', 'content_vec_500_onnx_on': True, 'hubert_base': 'pretrain/hubert_base.pt', 'hubert_base_jp': 'pretrain/rinna_hubert_base_jp.pt', 'hubert_soft': 'pretrain/hubert/hubert-soft-0d54a1f4.pt', 'nsf_hifigan': 'pretrain/nsf_hifigan/model', 'sample_mode': 'production', 'crepe_onnx_full': 'pretrain/crepe_onnx_full.onnx', 'crepe_onnx_tiny': 'pretrain/crepe_onnx_tiny.onnx', 'rmvpe': 'pretrain/rmvpe.pt', 'rmvpe_onnx': 'pretrain/rmvpe.onnx', 'whisper_tiny': 'pretrain/whisper_tiny.pt'} [INFO] [Voice Changer][RVCr2] Creating pipeline with slotInfo: {'slotIndex': 5, 'voiceChangerType': 'RVC', 'name': 'otrok-vid', 'description': '', 'credit': '', 'termsOfUseUrl': '', 'iconFile': '', 'speakers': {'0': 'Otrok ONNX'}, 'modelFile': 'otrok-vid.onnx', 'indexFile': 'added_otrok-vid.index', 'defaultTune': 0, 'defaultIndexRatio': 0, 'defaultProtect': 0.5, 'isONNX': True, 'modelType': 'onnxRVC', 'samplingRate': 48000, 'f0': True, 'embChannels': 256, 'embOutputLayer': 9, 'useFinalProj': True, 'deprecated': True, 'embedder': 'hubert_base', 'sampleId': '', 'version': 'v2'} [INFO] [Voice Changer][RVCr2] Creating pipeline with settings: {'gpu': 0, 'dstId': 0, 'f0Detector': 'rmvpe_onnx', 'tran': 12, 'silentThreshold': 1e-05, 'extraConvertSize': 4096, 'indexRatio': 0, 'protect': 0.5, 'rvcQuality': 0, 'silenceFront': 1, 'modelSamplingRate': 48000, 'speakers': {}} [Voice Changer] generate new embedder. (anyway) [Voice Changer] use torch contentvec Not implemented [Voice Changer] Loading index... Try loading... model_dir/5/added_otrok-vid.index [INFO] GENERATE INFERENCER<voice_changer.RVC.inferencer.OnnxRVCInferencer.OnnxRVCInferencer object at 0x7fcef6fb3f70> [INFO] GENERATE EMBEDDER<voice_changer.RVC.embedder.FairseqHubert.FairseqHubert object at 0x7fcef70964a0> [INFO] GENERATE PITCH EXTRACTOR<voice_changer.RVC.pitchExtractor.RMVPEOnnxPitchExtractor.RMVPEOnnxPitchExtractor object at 0x7fcef6e229e0> [INFO] [Voice Changer] [RVC] Initializing... done [INFO] [Voice Changer][RVC]: update_settings f0Detector:crepe_tiny [INFO] [Voice Changer][RVC]: update_settings serverReadChunkSize:192 [INFO] [Voice Changer][RVC]: update_settings extraConvertSize:4096 [INFO] [Voice Changer][RVC]: update_settings modelSlotIndex:1726124657005 [Voice Changer] update configuration: f0Detector dio [INFO] [Voice Changer][RVC]: update_settings f0Detector:dio [-144 216 27 ... 346 36 170] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [INFO] Generated Strengths: for prev:(4096,), for cur:(4096,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [ -34 188 76 ... -315 -316 -268] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [-141 -378 -39 ... 1035 2183 4018] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [5369 7118 8513 ... -822 -465 -359] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [ -365 -606 -797 ... -3406 -3071 -3211] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [-3438 -2825 -3389 ... -588 -327 -682] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [ -565 -77 -575 ... -1344 -1642 -1359] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [-1497 -1529 -1563 ... -283 -532 -270] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [-573 -352 -313 ... -271 -155 -389] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [Voice Changer] update configuration: tran 1 [INFO] [Voice Changer][RVC]: update_settings tran:1 [Voice Changer] update configuration: tran 1 [INFO] [Voice Changer][RVC]: update_settings tran:1 [Voice Changer] update configuration: tran 2 [INFO] [Voice Changer][RVC]: update_settings tran:2 [Voice Changer] update configuration: tran 4 [INFO] [Voice Changer][RVC]: update_settings tran:4 [Voice Changer] update configuration: tran 5 [INFO] [Voice Changer][RVC]: update_settings tran:5 [Voice Changer] update configuration: tran 8 [Voice Changer] update configuration: tran 6 [INFO] [Voice Changer][RVC]: update_settings tran:6 [INFO] [Voice Changer][RVC]: update_settings tran:8 [Voice Changer] update configuration: tran 9 [Voice Changer] update configuration: tran [INFO] [Voice Changer][RVC]: update_settings tran:9 12 [INFO] [Voice Changer][RVC]: update_settings tran:12 [-200 -306 -328 ... -15 176 -44] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [321 99 157 ... 532 535 651][Voice Changer] update configuration: tran 15 [Voice Changer] update configuration:[Voice Changer] update configuration: [Voice Changer] update configuration: tran 15 tran 16 [INFO]Received data type: <class 'numpy.ndarray'>[Voice Changer] update configuration: tran tran[INFO] 14 [INFO] [Voice Changer][RVC]: update_settings tran:16 [Voice Changer][RVC]: update_settings tran:15 [Voice Changer][RVC]: update_settings tran:15 Received data shape: (24576,)15 [Voice Changer] update configuration: [Voice Changer] Processing audio: 24576/48000hz tran 16 [INFO] [Voice Changer][RVC]: update_settings tran:15 [INFO] [Voice Changer][RVC]: update_settings tran:14 [INFO] [Voice Changer][RVC]: update_settings tran:16 [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [Voice Changer] update configuration:[Voice Changer] update configuration: tran [Voice Changer] update configuration: tran [361 371 428 ... 127 257 169]17 tran [Voice Changer] update configuration: Received data type: [INFO] [Voice Changer][RVC]: update_settings tran:17<class 'numpy.ndarray'> Received data shape: 34 [INFO] 19 tran 19 [Voice Changer] update configuration: [Voice Changer][RVC]: update_settings tran:34 [Voice Changer] update configuration: tran 18 tran 19 (24576,) [INFO] [Voice Changer][RVC]: update_settings tran:18 [Voice Changer] Processing audio: 24576/48000hz [INFO] [Voice Changer][RVC]: update_settings tran:19 [INFO] [Voice Changer][RVC]: update_settings tran:19 [INFO] [Voice Changer][RVC]: update_settings tran:19 [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [ 244 324 21 ... -608 -645 -626] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,) [Voice Changer] Processing audio: 24576/48000hz [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [Voice Changer] update configuration: tran 24 [INFO] [Voice Changer] update configuration:[Voice Changer] update configuration: tran [Voice Changer] update configuration: tran [Voice Changer][RVC]: update_settings tran:2420 tran[Voice Changer] update configuration: 24 [Voice Changer] update configuration: tran24 tran [INFO] 25[Voice Changer][RVC]: update_settings tran:24 [INFO] 22 [INFO] [Voice Changer][RVC]: update_settings tran:22 [Voice Changer][RVC]: update_settings tran:20 [-426 -571 -442 ... -637 -907 -764] Received data type: <class 'numpy.ndarray'> Received data shape: (24576,)[INFO] [Voice Changer][RVC]: update_settings tran:25 [Voice Changer] Processing audio: 24576/48000hz [INFO] [Voice Changer][RVC]: update_settings tran:24 [WARN] [Voice Changer] VC PROCESSING EXCEPTION!!! Model requires 6 inputs. Input Feed contains 5 Model requires 6 inputs. Input Feed contains 5 [Voice Changer] update configuration: tran 10 [INFO] [Voice Changer][RVC]: update_settings tran:10 [-694 -714 -521 ... 257 466 239] Received data type: <class 'numpy.ndarray'>

w-okada commented 1 week ago

convert torch model to onnx with vcclient.

samolego commented 1 week ago
