11-22 16:43:30 | INFO | subprocess.py:23 | Running: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
11-22 16:43:38 | INFO | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 | INFO | train_ms_jp_extra.py:117 | Loading configuration from config localhost
11-22 16:43:38 | INFO | train_ms_jp_extra.py:117 | Loading configuration from config 10086
11-22 16:43:38 | INFO | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 | INFO | train_ms_jp_extra.py:117 | Loading configuration from config 1
11-22 16:43:38 | INFO | train_ms_jp_extra.py:119 | Loading environment variables
MASTER_ADDR: localhost,
MASTER_PORT: 10086,
WORLD_SIZE: 1,
RANK: 0,
LOCAL_RANK: 0
11-22 16:43:38 | INFO | default_style.py:54 | At least 2 subdirectories are required for generating style vectors with respect to them, found 0.
11-22 16:43:38 | INFO | default_style.py:57 | Generating only neutral style vector instead.
11-22 16:43:39 | INFO | default_style.py:28 | Saved mean style vector to model_assets\Someone-v0.3
11-22 16:43:39 | INFO | default_style.py:36 | Saved style config to model_assets\Someone-v0.3\config.json
11-22 16:43:39 |WARNING | __init__.py:247 | C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\style_bert_vits2\models\utils is not a git repository, therefore hash value comparison will be ignored.
11-22 16:43:39 | INFO | data_utils.py:69 | Init dataset...
100%|█████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 41042.75it/s]
11-22 16:43:39 | INFO | data_utils.py:84 | skipped: 0, total: 124
11-22 16:43:39 | INFO | data_utils.py:348 | Bucket info: [115, 2, 1]
11-22 16:43:39 | INFO | data_utils.py:69 | Init dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<?, ?it/s]
11-22 16:43:39 | INFO | data_utils.py:84 | skipped: 0, total: 6
11-22 16:43:39 | INFO | train_ms_jp_extra.py:274 | Using noise scaled MAS for VITS2
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.weight
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.bias
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: emb_g.weight
11-22 16:43:42 | INFO | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\G_0.safetensors'
11-22 16:43:43 | INFO | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\D_0.safetensors'
11-22 16:43:43 | INFO | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\WD_0.safetensors'
11-22 16:43:43 | INFO | train_ms_jp_extra.py:492 | Loaded the pretrained models.
11-22 16:43:45 | INFO | train_ms_jp_extra.py:540 | Start training.
0%| | 0/11800 [00:00<?, ?it/s]11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
0%| | 0/11800 [00:10<?, ?it/s]
11-22 16:43:56 | ERROR | subprocess.py:33 | Error: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
Some weights of the model checkpoint at ./slm/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at ./slm/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[rank0]: Traceback (most recent call last):
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 1130, in <module>
[rank0]: run()
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 557, in run
[rank0]: train_and_evaluate(
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 695, in train_and_evaluate
[rank0]: for batch_idx, (
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__
[rank0]: data = self._next_data()
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
[rank0]: return self._process_data(data)
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
[rank0]: data.reraise()
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\_utils.py", line 705, in reraise
[rank0]: raise exception
[rank0]: UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
[rank0]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
[rank0]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
[rank0]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 199, in __getitem__
[rank0]: return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 97, in get_audio_text_speaker_pair
[rank0]: bert, ja_bert, en_bert, phones, tone, language = self.get_text(
[rank0]: File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 183, in get_text
[rank0]: ja_bert = bert_ori
[rank0]: UnboundLocalError: local variable 'bert_ori' referenced before assignment
11-22 16:43:56 | ERROR | train.py:360 | Train failed.
解決策
wavファイルの拡張子に大文字が含まれる(
J80.WAV
など)と学習に失敗するようです。Windows95時代の非常に古いデータを使用したため、ファイル名が拡張子を含め大文字でした。ドキュメントに記載するか、コードの修正が必要そうです。問題の説明
モデルの学習を開始できません。書き起こしと前処理までは動作しますが、学習を行おうとすると以下のエラーが出ます。
ソースコードから推察すると、こちらのWARNINGが原因のように見えます。
2024/1/8 時点のHEADではモデルの学習に成功していました。今回は最新のHEADまでコードを更新し、仮想環境を再構築しています。なお、それだけでは書き起こしに問題があったため、一度リポジトリ全体を削除してcloneしなおしました。期待される動作
正常に学習が開始されること。
現在の動作
再現ステップ
GUIにて学習を開始
バージョン情報
以下の組み合わせを試しましたがいずれも同様でした。OSはWindows11です。