wavファイルの拡張子に大文字が含まれると学習できない

解決策

wavファイルの拡張子に大文字が含まれる（ J80.WAV など）と学習に失敗するようです。Windows95時代の非常に古いデータを使用したため、ファイル名が拡張子を含め大文字でした。ドキュメントに記載するか、コードの修正が必要そうです。

問題の説明

モデルの学習を開始できません。書き起こしと前処理までは動作しますが、学習を行おうとすると以下のエラーが出ます。

UnboundLocalError: local variable 'bert_ori' referenced before assignment

ソースコードから推察すると、こちらのWARNINGが原因のように見えます。

11-22 17:39:20 |WARNING | data_utils.py:174 | Bert load Failed
11-22 17:39:20 |WARNING | data_utils.py:175 | unpickling stack underflow

複数のバージョンの組み合わせで試しましたが、いずれも機能しませんでした（詳細は後述）
2024/1/8 時点のHEADではモデルの学習に成功していました。今回は最新のHEADまでコードを更新し、仮想環境を再構築しています。なお、それだけでは書き起こしに問題があったため、一度リポジトリ全体を削除してcloneしなおしました。
- ↑データセットに変更を加えていました
Ubuntu環境もありますが、こちらはこの問題 #168 に当たっており、現在学習させる方法がありません

期待される動作

正常に学習が開始されること。

現在の動作

11-22 16:43:30 |  INFO  | subprocess.py:23 | Running: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config localhost
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 10086
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 0
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:117 | Loading configuration from config 1
11-22 16:43:38 |  INFO  | train_ms_jp_extra.py:119 | Loading environment variables
MASTER_ADDR: localhost,
MASTER_PORT: 10086,
WORLD_SIZE: 1,
RANK: 0,
LOCAL_RANK: 0
11-22 16:43:38 |  INFO  | default_style.py:54 | At least 2 subdirectories are required for generating style vectors with respect to them, found 0.
11-22 16:43:38 |  INFO  | default_style.py:57 | Generating only neutral style vector instead.
11-22 16:43:39 |  INFO  | default_style.py:28 | Saved mean style vector to model_assets\Someone-v0.3
11-22 16:43:39 |  INFO  | default_style.py:36 | Saved style config to model_assets\Someone-v0.3\config.json
11-22 16:43:39 |WARNING | __init__.py:247 | C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\style_bert_vits2\models\utils is not a git repository, therefore hash value comparison will be ignored.
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|█████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 41042.75it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 124
11-22 16:43:39 |  INFO  | data_utils.py:348 | Bucket info: [115, 2, 1]
11-22 16:43:39 |  INFO  | data_utils.py:69 | Init dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<?, ?it/s]
11-22 16:43:39 |  INFO  | data_utils.py:84 | skipped: 0, total: 6
11-22 16:43:39 |  INFO  | train_ms_jp_extra.py:274 | Using noise scaled MAS for VITS2
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.weight
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: enc_p.style_proj.bias
11-22 16:43:42 |WARNING | safetensors.py:42 | Missing key: emb_g.weight
11-22 16:43:42 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\G_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\D_0.safetensors'
11-22 16:43:43 |  INFO  | safetensors.py:48 | Loaded 'Data\Someone-v0.3\models\WD_0.safetensors'
11-22 16:43:43 |  INFO  | train_ms_jp_extra.py:492 | Loaded the pretrained models.
11-22 16:43:45 |  INFO  | train_ms_jp_extra.py:540 | Start training.
  0%|                                                                                        | 0/11800 [00:00<?, ?it/s]11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
11-22 16:43:50 |WARNING | data_utils.py:174 | Bert load Failed
11-22 16:43:50 |WARNING | data_utils.py:175 | unpickling stack underflow
  0%|                                                                                        | 0/11800 [00:10<?, ?it/s]
11-22 16:43:56 | ERROR  | subprocess.py:33 | Error: train_ms_jp_extra.py --config Data\Someone-v0.3\config.json --model Data\Someone-v0.3
Some weights of the model checkpoint at ./slm/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at ./slm/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[rank0]: Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 1130, in <module>
[rank0]:     run()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 557, in run
[rank0]:     train_and_evaluate(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\train_ms_jp_extra.py", line 695, in train_and_evaluate
[rank0]:     for batch_idx, (
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__
[rank0]:     data = self._next_data()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
[rank0]:     return self._process_data(data)
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
[rank0]:     data.reraise()
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\_utils.py", line 705, in reraise
[rank0]:     raise exception
[rank0]: UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
[rank0]:     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
[rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 199, in __getitem__
[rank0]:     return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 97, in get_audio_text_speaker_pair
[rank0]:     bert, ja_bert, en_bert, phones, tone, language = self.get_text(
[rank0]:   File "C:\Users\hogehoge\ghq\github.com\litagin02\Style-Bert-VITS2\data_utils.py", line 183, in get_text
[rank0]:     ja_bert = bert_ori
[rank0]: UnboundLocalError: local variable 'bert_ori' referenced before assignment

11-22 16:43:56 | ERROR  | train.py:360 | Train failed.

再現ステップ

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
git clone https://github.com/litagin02/Style-Bert-VITS2.git
cd Style-Bert-VITS2
uv venv venv
venv\Scripts\activate
uv pip install "torch<2.4" "torchaudio<2.4" --index-url https://download.pytorch.org/whl/cu118
uv pip install -r requirements.txt
python initialize.py
python app.py

GUIにて学習を開始

バージョン情報

以下の組み合わせを試しましたがいずれも同様でした。OSはWindows11です。

ソースコード	Python	PyTorch	備考
065a7ffa0a3214516f17ef5288a80d20c4ffb598	3.10.11	2.3.1+cu121
065a7ffa0a3214516f17ef5288a80d20c4ffb598	3.10.11	2.2.2+cu121
2.6.0	3.9.13	2.3.1+cu121	numpyがエラーになったため numpy==1.26.4 に置き換え

litagin02 / Style-Bert-VITS2

wavファイルの拡張子に大文字が含まれると学習できない #185

解決策

問題の説明

期待される動作

現在の動作

再現ステップ

バージョン情報