lexkoro / StyleTTS

11 stars 3 forks source link

Training ASR. #1

Open crypticsymmetry opened 1 year ago

crypticsymmetry commented 1 year ago

I am trying to train this repo on LibriTTS dataset. Starting with ASR training. question 1: is the data formatting the same "path|transcription|speaker#"? also i see in your config you are now using csv's, do i have to convert to a csv as well. question 2: does this training log look correct? the text that it prints doesnt make any sense.

I changed the code to use a single string as train_data and val_data instead of a list. config.yml:

batch_size: 64
pretrained_model: ""
train_data: "Data/ASR_Train_data_test_kaggle.txt"
val_data: "Data/ASR_Val_data_kaggle.txt"

meldataset.py:

//...
class MelDataset(torch.utils.data.Dataset):
    def __init__(self, data_list, dict_path=DEFAULT_DICT_PATH, sr=22050):

        _data_list = [l[:-1].split("|") for l in data_list]
        self.min_seq_len = int(0.6 * 22050)
        self.max_sql_len = int(10.0 * 22050)
        self.text_cleaner = TextCleaner(dict_path)
        self.sr = sr

        self.data_list = self._filter(_data_list)

    def _filter(self, data):
      data_list = [
          (data[0], data[1], data[2])
          for data in data
          if (
              self.max_sql_len
              > (Path(data[0]).stat().st_size // 2)
              > self.min_seq_len
              and len(data[1]) > 5
          )
      ]
      print("data_list length: ", len(data))
      print("filtered data_list length: ", len(data_list))
      return data_list

    def __len__(self):
        return len(self.data_list)
    //....

utils.py:

//...
def get_data_path_list(train_path=None, val_path=None):
    train_list = []
    val_list = []
    if train_path:
        with open(train_path, "r") as f:
            train_list.extend(f.readlines())

    if val_path:
        with open(val_path, "r") as f:
            val_list.extend(f.readlines())

    return train_list, val_list
//...

Example dataset format "train_data_test.txt".

LibriTTS/train-clean-100/1088/129236/1088_129236_000019_000008.wav|The lover sees no resemblance except to summer evenings and diamond mornings, to rainbows and the song of birds.|6
LibriTTS/train-clean-100/1088/129236/1088_129236_000020_000003.wav|It is destroyed for the imagination by any attempt to refer it to organization.|6
LibriTTS/train-clean-100/1098/133695/1098_133695_000012_000001.wav|He thought a great deal about her; she was constantly present to his mind. At a time when his thoughts had been a good deal of a burden to him her sudden arrival, which promised nothing and was an open handed gift of fate, had refreshed and quickened them, given them wings and something to fly for.|9
//...

training logs.

data_list length:  29493
filtered data_list length:  23397
speaker_samples_weight tensor([0.0027, 0.0027, 0.0027,  ..., 0.0019, 0.0019, 0.0180])
/opt/conda/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
data_list length:  2276
filtered data_list length:  1680
/opt/conda/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
  0%|                                                   | 0/365 [00:00<?, ?it/s]/opt/conda/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
A
"
W
R
"
W
O
W
R
"
D
M
"
S
;
"
A
"
M
"
;
S
W
H
O
M
E
O
"
;
"
"
"
I
"
A
"
I
T
A
B
T
T
Y
I
I
A
"
"
M
"
"
T
"
D
A
"
P
"
I
I
"
S
S
J
S
M
O
S
A
I
T
B
A
I
T
I
O
I
H
M
C
"
I
H
I
"
I
I
T
I
"
"
I
I
H
"
T
T
A
A
L
L
E
;
I
I
O
I
H
I
F
"
O
D
P
F
"
W
"
A
O
W
G
I
"
F
V
M
T
C
I
W
B
;
B
T
S
I
I
"
I
F
I
B
C
"
T
C
"
I
"
F
C
"
C
"
F
C
T
T
A
G
L
I
T
T
E
R
I
G
I
G
H
T
F
L
O
W
E
R
T
H
E
U
S
E
O
F
A
A
M
E
I
M
B
W
I
L
;
A
T
I
I
M
;
I
O
J
V
T
"
W
D
G
"
"
W
;
O
I
T
I
"
F
M
P
"
"
I
I
"
S
"
L
S
;
O
;
H
;
P
H
A
"
S
S
S
"
I
I
"
T
T
S
I
"
T
M
"
S
O
;
O
M
B
I
"
I
T
A
H
A
T
W
T
"
U
J
W
Y
"
H
"
"
M
S
O
I
D
;
B
B
"
A
T
;
;
S
"
I
"
C
"
B
A
D
"
B
A
M
F
A
"
B
W
I
"
R
"
G
B
S
P
I
T
"
I
G
B
"
;
"
"
T
"
H
"
"
S
"
I
L
J
R
I
A
H
"
"
H
H
W
I
"
P
P
K
F
A
"
H
S
"
I
J
H
F
H
S
;
T
F
Y
"
W
"
P
S
T
I
C
W
B
O
S
T
C
P
;
C
I
H
Y
"
"
I
"
B
D
"
S
"
"
T
"
J
H
F
S
A
A
"
"
"
T
W
"
I
I
"
U
"
H
"
"
I
I
K
"
S
B
I
"
Y
"
Q
"
T
"
T
;
"
I
I
"
T
I
"
"
"
I
B
A
T
"
O
I
"
I
"
S
B
;
F
"
P
A
R
I
S
S
I
"
"
K
"
"
"
T
M
H
"
"
I
I
"
R
I
S
"
D
"
O
I
"
E
"
S
"
"
T
E
S
P
T
M
T
B
"
I
"
"
Y
"
"
W
M
"
"
I
S
"
P
"
"
S
P
"
"
L
P
"
J
T
H
I
I
T
B
"
D
;
"
B
"
I
"
B
"
H
"
Y
E
I
"
W
"
Q
T
R
M
I
M
A
P
C
"
T
L
"
"
I
C
"
I
I
A
"
D
"
C
M
T
I
W
P
C
"
A
"
A
;
"
I
"
T
"
A
I
B
S
"
L
I
"
B
T
"
I
T
C
I
I
T
H
I
P
I
T
H
W
S
T
"
O
;
;
I
"
B
I
"
I
"
"
T
;
I
I
B
B
V
T
D
T
I
;
I
F
P
A
S
"
A
"
"
H
T
I
T
I
M
I
E
I
;
H
S
T
U
T
"
I
"
W
I
I
B
I
I
S
L
;
I
A
C
L
F
F
C
H
A
B
S
A
T
Y
S
S
D
R
B
O
A
A
L
L
"
W
I
L
H
S
H
"
I
"
P
A
"
W
I
O
"
Y
I
"
I
F
"
H
"
E
H
"
C
"
D
"
I
B
"
P
T
A
T
A
"
"
M
B
I
I
L
R
H
;
M
"
T
"
;
"
G
"
A
"
"
I
C
"
T
M
H
D
I
T
T
S
M
I
A
"
A
"
G
T
"
B
H
"
S
I
I
"
W
F
"
O
"
B
B
A
M
G
W
P
D
I
E
"
"
"
A
W
B
"
I
V
J
H
"
W
I
I
I
O
I
I
O
T
I
G
A
T
I
I
"
B
H
B
"
D
T
"
T
"
"
T
H
L
"
P
"
"
T
T
I
H
"
J
J
T
T
A
I
"
A
W
"
B
E
J
R
R
E
J
"
V
"
;
T
"
I
S
T
B
I
"
L
T
T
B
T
"
B
H
"
S
F
"
H
T
I
;
I
"
W
I
"
Y
;
"
"
I
;
I
C
J
"
I
I
"
T
T
"
"
D
M
"
I
"
A
"
S
E
S
S
W
"
C
"
B
O
I
G
T
M
S
P
H
I
I
I
"
S
C
B
T
A
I
A
W
H
I
"
I
"
A
A
C
"
I
M
H
C
T
"
I
"
S
;
O
S
I
"
W
I
"
D
"
Y
"
H
"
H
H
I
I
A
H
S
B
I
A
P
J
D
T
M
T
I
"
F
C
H
G
O
B
T
W
"
"
M
"
"
I
C
O
"
U
Y
L
"
D
"
"
I
"
I
;
I
B
"
I
"
W
I
"
B
"
W
"
S
C
B
I
D
F
B
D
I
"
D
M
"
T
D
C
L
T
"
A
;
"
A
I
;
"
O
B
I
"
H
A
J
W
T
D
"
A
"
;
"
A
M
"
I
I
I
I
I
"
C
"
Y
T
"
S
"
A
F
P
W
T
L
C
T
P
T
"
W
"
Q
O
I
S
[
]
I
I
W
V
W
"
I
"
T
F
"
G
T
"
M
P
"
"
D
"
;
T
O
M
I
"
"
W
I
B
;
K
I
"
A
B
"
C
I
"
T
A
S
L
"
T
"
W
I
"
S
"
R
D
I
;
I
I
U
"
A
O
T
"
"
A
B
T
C
P
A
I
T
F
T
O
M
T
T
T
F
;
S
M
S
I
T
T
I
O
B
S
"
W
I
J
I
"
B
T
I
"
"
Y
"
"
T
W
"
"
I
I
"
S
G
W
H
C
A
I
"
B
(
)
W
T
L
A
L
"
W
B
"
T
J
B
"
W
"
"
I
;
I
"
A
I
"
H
S
K
W
G
F
"
"
L
R
S
D
M
I
"
W
M
"
T
"
W
I
"
V
"
J
"
I
I
I
I
"
I
T
"
H
F
A
"
I
T
I
"
"
T
"
"
P
"
C
P
I
A
M
H
H
"
M
C
"
"
J
"
D
"
P
P
O
"
W
"
"
I
"
S
"
Y
C
"
T
T
M
I
I
H
H
T
I
T
T
G
"
H
E
J
;
T
O
C
K
A
H
C
S
R
"
I
J
"
P
I
W
H
"
C
I
"
T
T
"
I
I
I
W
"
T
I
"
O
"
G
"
I
T
T
B
"
A
"
H
W
"
"
T
S
O
O
C
"
O
I
"
L
;
"
"
T
K
S
G
G
W
S
T
A
L
"
P
;
"
S
L
"
A
"
B
W
"
T
K
"
"
"
"
D
I
"
W
"
"
I
"
S
S
W
C
B
H
S
"
T
M
A
I
G
G
I
"
"
Y
"
P
"
W
"
W
B
A
A
H
C
M
"
O
"
A
P
T
I
I
"
H
F
"
L
;
"
"
I
"
H
I
I
H
T
W
"
I
;
I
"
"
W
"
M
C
H
P
T
"
W
T
A
S
H
"
W
M
H
L
"
T
"
W
"
H
I
T
H
T
I
"
I
"
"
W
I
I
"
"
Y
;
W
"
B
I
"
"
W
"
A
C
I
"
H
W
"
J
"
"
I
I
"
"
W
"
W
"
B
B
R
"
;
I
"
A
T
L
(
)
W
I
S
I
"
J
T
S
W
"
A
K
F
G
C
"
S
A
"
T
A
I
D
C
A
T
L
H
"
W
"
"
I
G
"
A
I
"
W
"
"
A
"
H
H
H
"
I
I
B
I
"
Y
I
"
A
"
C
A
A
I
C
R
E
A
T
E
D
H
S
T
"
S
S
I
"
I
I
L
D
H
"
L
"
I
I
A
R
I
"
A
"
G
I
"
S
H
I
I
Y
I
"
I
B
I
C
"
A
"
T
"
I
I
"
L
E
O
F
P
P
T
O
E
I
I
C
"
"
W
"
I
"
"
S
B
;
P
B
I
I
B
I
I
C
"
D
G
C
I
I
A
U
S
H
I
I
"
M
;
I
"
"
I
"
W
"
"
"
W
A
;
I
I
"
P
"
M
A
F
I
L
"
W
"
A
O
W
T
W
;
I
"
B
I
"
"
A
I
"
T
A
;
S
I
B
"
L
"
C
C
I
T
"
"
W
"
T
"
I
"
E
G
O
B
T
V
S
"
O
S
"
R
A
T
S
"
B
I
I
"
W
J
"
"
J
"
I
I
"
H
"
W
C
Y
I
C
"
C
W
I
"
"
T
I
"
"
T
"
F
"
S
R
"
"
I
"
L
"
T
"
H
"
I
T
I
I
S
O
I
P
F
G
T
R
;
"
A
I
"
I
"
H
"
T
F
W
W
"
A
I
"
I
"
H
A
J
A
S
Y
I
L
"
S
I
F
;
I
W
"
S
I
W
"
B
"
I
W
A
"
H
"
H
H
S
L
T
O
B
J
W
B
B
;
"
B
I
"
"
I
"
L
B
V
T
M
T
M
Y
"
W
"
"
R
"
E
T
"
W
T
F
E
"
L
"
A
"
A
"
I
I
W
"
A
"
P
A
I
"
Y
I
"
B
I
L
I
I
"
W
"
I
I
"
L
"
I
T
W
"
K
C
I
G
B
C
"
F
"
G
Y
H
W
H
T
M
"
"
I
"
B
T
"
I
T
"
A
"
E
G
D
;
D
W
"
B
"
O
"
R
"
T
J
I
[
E
J
"
I
"
S
K
A
I
T
I
T
H
I
O
B
"
I
F
"
J
I
V
I
Y
"
T
J
"
I
I
Y
W
;
W
I
"
F
I
;
I
;
I
"
L
;
"
"
"
I
C
H
"
"
I
"
M
T
"
I
T
"
O
;
;
I
"
"
A
"
Y
W
W
"
"
B
"
T
"
H
"
"
"
P
O
H
H
C
"
M
"
W
I
"
I
"
"
Y
I
"
J
"
I
"
I
"
L
"
A
T
G
L
"
B
L
H
"
"
I
"
A
"
I
"
P
J
I
L
D
"
Y
;
I
I
O
H
E
J
"
"
I
"
M
"
A
D
I
I
A
"
M
I
"
;
"
I
I
I
"
C
"
H
B
B
"
P
"
G
T
B
"
B
T
I
"
B
C
R
I
I
"
T
S
E
I
H
F
J
D
B
"
W
"
J
I
"
F
"
A
"
I
I
I
"
"
W
"
P
G
W
"
P
C
T
"
A
H
B
A
H
P
T
R
L
F
C
B
T
I
"
A
"
I
I
C
"
I
T
I
"
W
H
"
T
J
"
I
C
"
I
I
"
S
;
"
I
T
"
D
H
"
A
I
"
I
"
L
"
"
G
F
I
"
C
"
Q
I
A
T
M
W
M
"
I
;
I
;
;
I
E
;
O
H
T
E
C
D
"
T
S
B
H
T
L
I
I
M
P
"
I
F
S
H
G
"
I
S
I
P
T
T
C
A
"
I
S
I
"
I
A
;
I
"
O
I
"
S
"
T
I
"
T
A
"
I
A
"
M
I
I
I
A
"
B
B
"
S
M
C
"
S
M
H
"
S
"
W
F
P
"
I
"
H
"
B
I
B
T
"
A
I
;
H
"
I
L
A
M
B
S
"
A
W
A
G
R
I
E
D
I
C
O
A
C
J
M
B
"
T
"
M
M
M
H
;
"
I
"
D
"
W
I
"
B
"
H
"
H
"
I
"
I
S
H
E
L
L
I
E
S
S
E
U
I
B
T
"
I
"
M
"
I
"
L
T
M
"
"
B
S
D
"
I
I
"
G
"
L
"
E
"
P
"
"
S
S
S
"
I
M
"
I
"
T
"
R
T
I
W
W
"
Y
M
F
T
S
F
S
A
I
R
A
Z
T
S
S
O
S
"
T
S
"
"
W
"
C
K
I
I
"
S
R
G
R
A
"
I
"
I
A
L
"
M
J
"
Y
"
M
M
M
O
"
T
W
J
M
C
R
S
"
I
T
S
H
B
"
W
"
M
I
"
A
"
B
G
T
P
P
S
T
I
"
P
C
"
P
A
I
W
E
S
T
M
I
S
T
E
R
G
A
Z
E
T
T
E
I
(
I
)
"
T
"
Y
H
S
E
"
Y
"
"
Y
"
S
C
I
A
C
B
T
E
J
R
R
E
J
O
I
L
L
;
I
G
G
"
I
I
"
I
I
"
"
Y
"
I
G
G
I
K
"
O
P
"
U
T
T
"
I
I
"
C
"
W
A
I
A
P
I
I
W
A
"
A
"
R
H
;
"
F
"
H
I
"
T
"
"
I
T
B
B
A
E
"
A
"
K
"
D
"
;
"
"
"
I
"
I
"
S
"
F
[Loss: 15.0932, LR: 0.00050]:   0%|           | 1/365 [00:22<2:13:41, 22.04s/it]"
G
J
C
"
Y
W
W
A
T
"
H
I
"
W
W
"
Y
E
I
B
J
I
T
S
L
P
"
A
"
T
"
T
I
B
B
B
"
L
D
"
R
R
R
"
O
H
B
J
T
H
I
P
I
H
W
"
T
A
F
I
L
Y
"
I
T
"
"
H
I
"
B
T
;
H
W
R
R
O
S
P
H
B
O
M
"
S
M
"
I
"
Y
"
A
R
T
;
I
"
"
A
"
I
F
I
"
I
"
O
I
H
T
M
M
M
S
T
"
I
"
T
"
O
"
R
O
T
P
B
;
"
I
"
"
"
[Loss: 13.3964, LR: 0.00050]:   1%|           | 2/365 [00:23<1:00:05,  9.93s/it]E
"
"
H
Y
U
H
P
S
I
W
T
B
W
I
Y
T
"
"
I
"
J
S
S
W
C
B
H
I
I
I
I
I
"
B
"
A
"
F
A
S
"
I
A
"
M
A
I
I
B
I
T
I
M
W
W
I
I
S
I
I
T
I
"
T
"
W
J
"
I
;
M
A
"
L
W
"
I
"
W
"
L
C
"
S
"
W
"
A
"
O
H
"
T
A
F
;
"
W
;
I
"
[Loss: 9.3134, LR: 0.00050]:   1%|              | 3/365 [00:24<36:07,  5.99s/it]B
A
C
M
"
W
I
I
S
I
I
I
"
I
"
B
I
I
I
R
I
"
H
C
H
W
"
J
H
J
"
P
I
R
U
C
"
A
"
"
"
P
"
I
"
S
W
T
T
A
I
B
;
"
T
"
B
I
T
A
L
D
C
W
"
T
"
T
A
F
A
P
;
L
"
W
I
E
C
"
W
I
S
F
F
S
H
T
I
;
I
I
T
;
W
C
B
W
B
I
F
B
H
"
Y
"
C
A
H
W
T
A
H
;
I
"
I
;
"
H
W
A
C
T
T
T
"
T
"
"
I
T
D
B
G
B
I
H
M
Q
W
T
H
"
F
I
E
"
C
B
;
;
G
O
T
"
"
L
"
H
T
P
B
L
H
P
I
A
T
H
[Loss: 8.1618, LR: 0.00050]:   1%|▏             | 4/365 [00:26<26:49,  4.46s/it]A
/.....
lexkoro commented 1 year ago

Hi,

1) doesn't matter what file type it is.

2) you would have to add the the missing characters/ phonemes used in your dataset to this file: word_index_dict_new.txt or your own file and reference it in the code. https://github.com/lexkoro/StyleTTS/blob/243b54ff398cc82f97b2526715aec048d61d1b85/AuxiliaryASR/meldataset.py#L29

The log is printing all characters it doesn't find in the word_index_dict_new.txt file. You can check for this. https://github.com/lexkoro/StyleTTS/blob/243b54ff398cc82f97b2526715aec048d61d1b85/AuxiliaryASR/text_utils.py#L9

Also I remove the phonemization step since I have precomputed it directly into the metadata file:

/SqNarrator/wavs/a0jm2r00.171.wav|SqNarrator_EN|5|1|jɛp, teɪsts d͡ʒʌst laɪ̯k jʌd ɛkspɛkt.|yep, tastes just like you'd expect.
/SqNarrator/wavs/a0ji2r00.0z1.wav|SqNarrator_EN|5|1|ju siː sʌm dɹɪpɪŋ, uzɪŋ stʌf.|you see some dripping, oozing stuff.

So you might wanna add it back.