seastar105 / pflow-encodec

Implementation of TTS model based on NVIDIA P-Flow TTS Paper
67 stars 5 forks source link

Low-resource finetune #6

Open yiwei0730 opened 4 months ago

yiwei0730 commented 4 months ago

I'm sorry to bother you I want to change this model from zero-shot to finetune with less data When I use 4 data (repeated about 10 times) = 40 data for training. But He will keep popping up errors. I don’t know how to solve this. """ text_tokens, durations, latents = map(list, zip(*batch)) ValueError: not enough values ​​to unpack (expected 3, got 0) """

yiwei0730 commented 4 months ago

ERROR MESSAGE: [2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Created 3 buckets [2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Boundaries: [3.0, 5.0, 7.0, 10.0] [2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Bucket sizes: [0, 66, 22] /root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=31in theDataLoader` to improve performance. /root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (20) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. 4 4 0 Epoch 0/1999 ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/20 0:00:17 • 0:00:03 8.77it/s v_num: 16.000 train/enc_loss_step: 0.098 train/duration_loss_step: 0.349 train/flow_matching_loss_step: 0.223 train/latent_loss_step: 0.321 train/loss_step: 0.671 other/grad_norm: 5.177 [2024-07-23 17:46:17,798][pflow_encodec.utils.utils][ERROR] - [rank: 0] Traceback (most recent call last): File "/workspace/yiwei/pflow-encodec/pflow_encodec/utils/utils.py", line 68, in wrap metric_dict, object_dict = task_func(cfg=cfg) File "/workspace/yiwei/pflow-encodec/pflow_encodec/train.py", line 89, in train trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path")) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit call._call_and_handle_interrupt( File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, *kwargs) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run results = self._run_stage() File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage self.fit_loop.run() File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run self.advance() File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance self.epoch_loop.run(self._data_fetcher) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run self.advance(data_fetcher) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/training_epochloop.py", line 212, in advance batch, , = next(data_fetcher) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fetchers.py", line 133, in next batch = super().next() File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fetchers.py", line 60, in next batch = next(self.iterator) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/utilities/combined_loader.py", line 341, in next out = next(self._iterator) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/utilities/combined_loader.py", line 78, in next out[i] = next(self.iterators[i]) File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next__ data = self._next_data() File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/workspace/yiwei/pflow-encodec/pflow_encodec/data/datamodule.py", line 83, in _collate text_tokens, durations, latents = map(list, zip(batch)) ValueError: not enough values to unpack (expected 3, got 0)

seastar105 commented 4 months ago

@yiwei0730 it seems there's an bug in BucketSampler, since error says sampler does not delete empty bucket properly. could you modify Sampler's code below and run? https://github.com/seastar105/pflow-encodec/blob/6dd533958a07eac39b9146b5b9a8c4837a2c17ba/pflow_encodec/data/sampler.py#L62-L75

    def _create_bucket(self):
        buckets = [[] for _ in range(len(self.boundaries) - 1)]
        for i in range(len(self.durations)):
            length = self.durations[i]
            idx_bucket = self._bisect(length)
            if idx_bucket != -1:
                buckets[idx_bucket].append(i)

        empty_indices = []
        for idx, bucket in enumerate(buckets):
            if len(bucket) == 0:
                empty_indices.append(idx)

        for i in sorted(empty_indices, reverse=True):
            del buckets[i]
            del self.boundaries[i+1]

        return buckets

by the way, Was chinese, english mixed training did well?

yiwei0730 commented 4 months ago

The result of zero-shot is not very good and the similarity is not good. I tried low-resource fintune (4 sentenses) myself, but the effect was very poor, and the similarity and naturalness were not good. Moreover, his language judgment will be confused. When I input Japanese or Korean, the output will become strange language.