About config, how to decide the hyperparameters?

Zhanlo commented 9 months ago

About configs

I would like to customize the usage of datasets and the duration of model training. Could you please tell me how to determine some of the hyperparameters in the config file, such as the relationship between the number of epochs, num_train_iter, num_eval_iter, and batch size? I'm sorry I couldn't find any relevant explanations. Thank you!

Hhhhhhao commented 9 months ago

If you are using distributed training, i.e., multiprocessing_distributed set to True, num_train_iter and epoch jointly determine the training iterations per epoch as num_train_iter // epoch. If multiprocessing_distributed is set to False, number of training iterations per epoch would be the length of data loader, the num_train_iter doesn't need to be manually set.

Zhanlo commented 9 months ago

Thank you for the clarification. Upon reviewing the code, I found that if it is not for distributed training, i.e., setting multiprocessing_distributed to False, in the get_data_loader() method inside build.py, the default setting will calculate num_samples = num_train_iter // epoch * batch_size, which defaults to 1024*64. This will affect the size of the sample quantity of the dataloader's sampler, that is, the size of the sample quantity involved in each epoch during the actual training process. If num_samples exceeds the size of the specified dataset, it will result in repeated sampling operations. Am I understanding this correctly?

github-actions[bot] commented 7 months ago

Stale issue message

microsoft / Semi-supervised-learning

About config, how to decide the hyperparameters? #192

About configs