no sola buffer and slow down converting

nadare881 commented 1 year ago

windowsのv.1.5.2.4aビルド版を実行時変換がうまくいくときはresが200msくらいの設定でも bufferingがうまくいかないのかresが10000msを超えてしまい、音声がぶつ切りになってしまいます。

何回か再現してみたところ、この事象が発生する際は

VC PROCESSING!!!! EXCEPTION!!! Out of bounds on buffer access (axis 0)
Traceback (most recent call last):
  File "voice_changer\VoiceChanger.py", line 235, in on_request_sola
  File "voice_changer\RVC\RVC.py", line 361, in inference
  File "voice_changer\RVC\RVC.py", line 346, in _pyTorch_inference
  File "voice_changer\RVC\custom_vc_infer_pipeline.py", line 159, in pipeline
  File "voice_changer\RVC\custom_vc_infer_pipeline.py", line 49, in get_f0
  File "pyworld\pyworld.pyx", line 193, in pyworld.pyworld.harvest
IndexError: Out of bounds on buffer access (axis 0)

[Voice Changer] no sola buffer. (You can ignore this.)
[XXXX:XXXX/XXXXX.XXX:ERROR:gpu_init.cc(523)] Passthrough is not supported, GL is disabled, ANGLE is

と出ていました。

こちらのIssueとも関係ありそうです。確認のほどよろしくお願いします。 https://github.com/w-okada/voice-changer/issues/182

以下実行環境 OS 名 Microsoft Windows 11 Pro バージョン 10.0.22621 ビルド 22621 プロセッサ 12th Gen Intel(R) Core(TM) i9-12900K GPU NVIDIA GeForce RTX 3090

起動からエラーまでの例

RVC initialization:  {'content_vec_500': 'checkpoint_best_legacy_500.pt', 'content_vec_500_onnx': 'checkpoint_best_legacy_500.onnx', 'content_vec_500_onnx_on': 1, 'hubert_base': 'hubert_base.pt', 'hubert_soft': 'hubert-soft-0d54a1f4.pt', 'nsf_hifigan': 'nsf_hifigan/model'}
mps:  False
post_update_settings dstId 1
post_update_settings crossFadeOffsetRate 0.1
post_update_settings crossFadeEndRate 1
post_update_settings crossFadeOverlapSize 1024
post_update_settings framework PyTorch
post_update_settings onnxExecutionProvider CPUExecutionProvider
Onnx is not enabled. Please load model.
onnxExecutionProvider is not mutable variable or unknown variable!
post_update_settings f0Factor 1
post_update_settings f0Detector harvest
f0Factor is not mutable variable or unknown variable!
post_update_settings silentThreshold 0.00001
post_update_settings extraConvertSize 32768
post_update_settings clusterInferRatio 0.1
clusterInferRatio is not mutable variable or unknown variable!
post_update_settings modelSamplingRate 40000
post_update_settings silenceFront 1
post_update_settings inputSampleRate 48000
[Voice Changer] RVC loading... slot: 0
[Voice Changer] Prepare Model of slot: 0
gin_channels: 256 self.spk_embed_dim: 109
Generated Strengths: for prev:(1024,), for cur:(1024,)
VC PROCESSING!!!! EXCEPTION!!! Out of bounds on buffer access (axis 0)
Traceback (most recent call last):
  File "voice_changer\VoiceChanger.py", line 235, in on_request_sola
  File "voice_changer\RVC\RVC.py", line 361, in inference
  File "voice_changer\RVC\RVC.py", line 346, in _pyTorch_inference
  File "voice_changer\RVC\custom_vc_infer_pipeline.py", line 159, in pipeline
  File "voice_changer\RVC\custom_vc_infer_pipeline.py", line 49, in get_f0
  File "pyworld\pyworld.pyx", line 193, in pyworld.pyworld.harvest
IndexError: Out of bounds on buffer access (axis 0)

VC PROCESSING!!!! EXCEPTION!!! Out of bounds on buffer access (axis 0)
Traceback (most recent call last):
  File "voice_changer\VoiceChanger.py", line 235, in on_request_sola
  File "voice_changer\RVC\RVC.py", line 361, in inference
  File "voice_changer\RVC\RVC.py", line 346, in _pyTorch_inference
  File "voice_changer\RVC\custom_vc_infer_pipeline.py", line 159, in pipeline
  File "voice_changer\RVC\custom_vc_infer_pipeline.py", line 49, in get_f0
  File "pyworld\pyworld.pyx", line 193, in pyworld.pyworld.harvest
IndexError: Out of bounds on buffer access (axis 0)

[Voice Changer] no sola buffer. (You can ignore this.)

w-okada commented 1 year ago

再現方法はわかりますか？確かにこのメッセージは数回出たことがあるのですが（遅延が発生した記憶はないですが）、私の環境だとめったに出ないのですよね。。。

nadare881 commented 1 year ago

私の環境だとほぼ毎回、PCを再起動しても出るので、再現しない方法を探すのが難しいです。ソースコードを確認して自分でも怪しそうなところは探してみます

Tybost commented 1 year ago

I've had res surge to crazy high numbers and recall fixing it by either enabling suppression2 or changing my USB Sound Blaster G3 DAC / AMP from 96Hz to 48hz. Make sure Windows default format is set to 48000hz too under playback advanced.

nadare881 commented 1 year ago

試してダメだったこと

セキュリティソフトのスキャンの対象外にする
出力機器をモニター(48000hz)などいろいろ試す
マイクがヘッドホン内蔵のやつだと周波数決められなかったので、外付けマイク(48000hz)に変える
モデルの変更

pre main postを見たらこんな感じでした。どこに時間がかかっているのか... ~~コードにtimer仕込んで計測してみます~~ こういうアプリはどう動いているか分からずどこに手を付ければいいか分からないですね...

w-okada commented 1 year ago

情報、ありがとうございます。まず、内部事情としては、 RVCはPipelineの中でいろいろやっていたので、前処理と後処理を分離できていないのです。なので、pre, postはサンプリングレートの変換とデータ量の算出くらいしかやっていないはずです。

mainはもともとRVCで提供されていたpipelineをほぼそのままコピペして、不要と思われるところを端除した感じになっています。なので、pipeline処理全体のどこかで遅くなっているかもしれません。

ところで、v.1.5.2.4aより前はもう少し早く動いていたということでしょうか？また、バックグラウンドで何か動いているということはないでしょうか？ HarvestはCPUで処理されるので、GPUだけでなくCPUの負荷も影響します。

nadare881 commented 1 year ago

ところで、v.1.5.2.4aより前はもう少し早く動いていたということでしょうか？

v1.5.2.4以前は使っていないので分からないですね...

また、バックグラウンドで何か動いているということはないでしょうか？ HarvestはCPUで処理されるので、GPUだけでなくCPUの負荷も影響します。

他に重そうなソフトウェアは動かしておらず、マシンスペックにも十分余裕があるためバックグラウンドの影響はなさそうです。タスクマネージャーでみてもCPU、GPU、RAMすべてに余裕がありますね

w-okada commented 1 year ago

なるほど。 v.1.5.2.4aにおいて、200msecの場合もあれば460+αmsecの場合もあるってことですね。 extra data length を16Kとか8kにするとどんな挙動になるのでしょうか？

nadare881 commented 1 year ago

原因がわかりました buf < resになると音声が処理しきれずどんどん積もっていってました。調整時は大きめのbufからはじめて、resよりも余裕をもって大きいinput_chunkを選択すると良さそうですね。解決したのでcloseします。

w-okada / voice-changer

no sola buffer and slow down converting #199