lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
936 stars 214 forks source link

Trained on different datasets but with the same checkpoint model #1236

Closed OswaldoBornemann closed 9 months ago

OswaldoBornemann commented 9 months ago

I am going to train a zipformer based on the previous trained checkpoint model, but on the different dataset. I found the following warning, and it seems that the model is stuck in the process of About to create train dataloader.

We detected you're trying to use a CutSampler with rank 2 and world_size 3 inside an IterableDatasetWrapper. Setting rank != 0 and world_size != 1 in Lhotse's CutSampler is inteded for map-style datasets, when the sampler exists in the main training loop. Make sure these settings are intentional or pass rank=0 and world_size=1 to the sampler's constructor.

OswaldoBornemann commented 9 months ago

I found the reason, the reason lies in the load_checkpoint_if_available.

In this function, it defines something below:

if params.start_batch > 0:
        if "cur_epoch" in saved_params:
            params["start_epoch"] = saved_params["cur_epoch"]

        if "cur_batch_idx" in saved_params:
            params["cur_batch_idx"] = saved_params["cur_batch_idx"]

But this would incompatible with the new dataset. Because the batch idx is not consistent. So when i comment them out, the model training is normal.

pzelasko commented 9 months ago

I think the issue in the first message and the solution in the second are unrelated (although it's good that you found it). I don't see IterableDatasetWrapper generally used for training in Icefall so it looks like you customized the code. 99% of the time you'd want to follow the message and set rank and world size to (0, 1), otherwise you will be omitting ((world_size-1)/world_size * 100%) portion of training data.

OswaldoBornemann commented 9 months ago

Yeah, i got your mean. However, i have tried to upgrade to the lastest lhoste but not the latest version of k2 or icefall, this warning still happend.

pzelasko commented 9 months ago

I suggest that you search the code (using grep/rg/IDE) for IterableDatasetWrapper usage and adjust the arguments in the sampler manually.