qiuqiangkong / panns_transfer_to_gtzan

100 stars 39 forks source link

Confusion about training iterations #5

Closed Youngtard closed 3 years ago

Youngtard commented 3 years ago

Based on what I know, in a normal situation I'm aware of the total number of iterations being the (number_of_epochs) * len(data_loader) len(data_loader) being number of steps. Evaluation will usually be done at every last step depending on the number of epochs, the final evaluation being done at the final epoch.

As in the case of the GTZAN dataset, with a batch size of 32, and train data of 900 samples out of 1000 (10 cv splits), the number of steps per loader would be 29 i.e. np.ceil(900 / 32), and total number of iterations would be 29 * number_of_epochs.

In this repo, and in the PANN paper, I understand the reason for using the balanced sampler method (due to data imbalance, and to prevent some form of overfitting on mini-batches). This obviously leads to a different way of data loading. As seen in the repo, the train loader is infinite? and stops depending on whatever iteration is specified. Evaluation is done at every 200th iteration.

Question What I don't understand is how the total number of iterations is determined? And at what n-th iteration is evaluation on the validation data done? In this repo, it is 10000, and 200 respectively.

My assumption was that with number of steps as 29, the total number of iterations would be something like 290 assuming 10 epochs. Then evaluation would be at every 29th iteration

So in summary, what am I not getting/understanding based on the question above?

qiuqiangkong commented 3 years ago

In training, the selection of training iterations is arbitrary. It can be 10000, 20000, or 50000. Usually people hold a validation set to select the best iteration to stop. In this repo I just select to 10000 for demonstration, and it works well. People can use either epoch or iteration in training, and I prefer to use iteration, because in balanced training strategy there is no "epoch", so using iteration is a more convenient way.

Youngtard commented 3 years ago

Thans for the reply. I do understand now.