litagin02 / Style-Bert-VITS2

Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
GNU Affero General Public License v3.0
768 stars 99 forks source link

wavファイルの拡張子に大文字が含まれると学習できない #185

Open sabipipe opened 3 days ago

sabipipe commented 3 days ago

解決策

wavファイルの拡張子に大文字が含まれる( J80.WAV など)と学習に失敗するようです。Windows95時代の非常に古いデータを使用したため、ファイル名が拡張子を含め大文字でした。ドキュメントに記載するか、コードの修正が必要そうです。

問題の説明

モデルの学習を開始できません。書き起こしと前処理までは動作しますが、学習を行おうとすると以下のエラーが出ます。

UnboundLocalError: local variable 'bert_ori' referenced before assignment

ソースコードから推察すると、こちらのWARNINGが原因のように見えます。

11-22 17:39:20 |WARNING | data_utils.py:174 | Bert load Failed
11-22 17:39:20 |WARNING | data_utils.py:175 | unpickling stack underflow

期待される動作

正常に学習が開始されること。

現在の動作

11-22 16:43:30 |  INFO  | subprocess.py:23 | Running: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config localhost
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 10086
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 1
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:119 | Loading environment variables
MASTER_ADDR: localhost,
MASTER_PORT: 10086,
WORLD_SIZE: 1,
RANK: 0,
LOCAL_RANK: 0
11-22 16:43:38 |  INFO  | default_style.py:54 | At least 2 subdirectories are required for generating style vectors with respect to them, found 0.
11-22 16:43:38 |  INFO  | default_style.py:57 | Generating only neutral style vector instead.
11-22 16:43:39 |  INFO  | default_style.py:28 | Saved mean style vector to model_assets\Someone-v0.3
11-22 16:43:39 |  INFO  | default_style.py:36 | Saved style config to model_assets\Someone-v0.3\config.json
11-22 16:43:39 |WARNING | __init__.py:247 | C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\style_bert_vits2\models\utils is not a git repository, therefore hash value comparison will be ignored.
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|█████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 41042.75it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 124
11-22 16:43:39 |  INFO  | data_utils.py:348 | Bucket info: [115, 2, 1]
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<?, ?it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 6
11-22 16:43:39 |  INFO  | train_ms_jp_extra.py:274 | Using noise scaled MAS for VITS2
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.weight
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.bias
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: emb_g.weight
11-22 16:43:42 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\G_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\D_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\WD_0.safetensors'
11-22 16:43:43 |  INFO  | train_ms_jp_extra.py:492 | Loaded the pretrained models.
11-22 16:43:45 |  INFO  | train_ms_jp_extra.py:540 | Start training.
  0%|                                                                                        | 0/11800 [00:00<?, ?it/s]11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
  0%|                                                                                        | 0/11800 [00:10<?, ?it/s]
11-22 16:43:56 | ERROR  | subprocess.py:33 | Error: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
Some weights of the model checkpoint at ./slm/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at ./slm/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[rank0]: Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 1130, in <module>
[rank0]:     run()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 557, in run
[rank0]:     train_and_evaluate(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 695, in train_and_evaluate
[rank0]:     for batch_idx, (
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__
[rank0]:     data = self._next_data()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
[rank0]:     return self._process_data(data)
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
[rank0]:     data.reraise()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\_utils.py", line 705, in reraise
[rank0]:     raise exception
[rank0]: UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
[rank0]:     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 199, in __getitem__
[rank0]:     return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 97, in get_audio_text_speaker_pair
[rank0]:     bert, ja_bert, en_bert, phones, tone, language = self.get_text(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 183, in get_text
[rank0]:     ja_bert = bert_ori
[rank0]: UnboundLocalError: local variable 'bert_ori' referenced before assignment

11-22 16:43:56 | ERROR  | train.py:360 | Train failed.

再現ステップ

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
git clone https://github.com/litagin02/Style-Bert-VITS2.git
cd Style-Bert-VITS2
uv venv venv
venv\Scripts\activate
uv pip install "torch<2.4" "torchaudio<2.4" --index-url https://download.pytorch.org/whl/cu118
uv pip install -r requirements.txt
python initialize.py
python app.py

GUIにて学習を開始

バージョン情報

以下の組み合わせを試しましたがいずれも同様でした。OSはWindows11です。

ソースコード Python PyTorch 備考
065a7ffa0a3214516f17ef5288a80d20c4ffb598 3.10.11 2.3.1+cu121
065a7ffa0a3214516f17ef5288a80d20c4ffb598 3.10.11 2.2.2+cu121
2.6.0 3.9.13 2.3.1+cu121 numpyがエラーになったため numpy==1.26.4 に置き換え
sabipipe commented 3 days ago

データセットのファイルの拡張子が大文字であることが問題のようにみえるので記述を修正しました。