Open yiwei0730 opened 3 months ago
ERROR MESSAGE:
[2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Created 3 buckets
[2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Boundaries: [3.0, 5.0, 7.0, 10.0]
[2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Bucket sizes: [0, 66, 22]
/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers
which may be a bottleneck. Consider increasing the value of the num_workers
argumentto
num_workers=31in the
DataLoader` to improve performance.
/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (20) is smaller than the logging
interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
4
4
0
Epoch 0/1999 ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/20 0:00:17 • 0:00:03 8.77it/s v_num: 16.000 train/enc_loss_step: 0.098 train/duration_loss_step: 0.349
train/flow_matching_loss_step: 0.223 train/latent_loss_step: 0.321 train/loss_step: 0.671
other/grad_norm: 5.177
[2024-07-23 17:46:17,798][pflow_encodec.utils.utils][ERROR] - [rank: 0]
Traceback (most recent call last):
File "/workspace/yiwei/pflow-encodec/pflow_encodec/utils/utils.py", line 68, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "/workspace/yiwei/pflow-encodec/pflow_encodec/train.py", line 89, in train
trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, *kwargs)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
results = self._run_stage()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage
self.fit_loop.run()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
self.advance()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
self.epoch_loop.run(self._data_fetcher)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
self.advance(data_fetcher)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/training_epochloop.py", line 212, in advance
batch, , = next(data_fetcher)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fetchers.py", line 133, in next
batch = super().next()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fetchers.py", line 60, in next
batch = next(self.iterator)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/utilities/combined_loader.py", line 341, in next
out = next(self._iterator)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/utilities/combined_loader.py", line 78, in next
out[i] = next(self.iterators[i])
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next__
data = self._next_data()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/workspace/yiwei/pflow-encodec/pflow_encodec/data/datamodule.py", line 83, in _collate
text_tokens, durations, latents = map(list, zip(batch))
ValueError: not enough values to unpack (expected 3, got 0)
@yiwei0730 it seems there's an bug in BucketSampler, since error says sampler does not delete empty bucket properly. could you modify Sampler's code below and run? https://github.com/seastar105/pflow-encodec/blob/6dd533958a07eac39b9146b5b9a8c4837a2c17ba/pflow_encodec/data/sampler.py#L62-L75
def _create_bucket(self):
buckets = [[] for _ in range(len(self.boundaries) - 1)]
for i in range(len(self.durations)):
length = self.durations[i]
idx_bucket = self._bisect(length)
if idx_bucket != -1:
buckets[idx_bucket].append(i)
empty_indices = []
for idx, bucket in enumerate(buckets):
if len(bucket) == 0:
empty_indices.append(idx)
for i in sorted(empty_indices, reverse=True):
del buckets[i]
del self.boundaries[i+1]
return buckets
by the way, Was chinese, english mixed training did well?
The result of zero-shot is not very good and the similarity is not good. I tried low-resource fintune (4 sentenses) myself, but the effect was very poor, and the similarity and naturalness were not good. Moreover, his language judgment will be confused. When I input Japanese or Korean, the output will become strange language.
I'm sorry to bother you I want to change this model from zero-shot to finetune with less data When I use 4 data (repeated about 10 times) = 40 data for training. But He will keep popping up errors. I don’t know how to solve this. """ text_tokens, durations, latents = map(list, zip(*batch)) ValueError: not enough values to unpack (expected 3, got 0) """