Parametric UMAP : "n_training_epochs=20" results in training for 200 epochs

lmcinnes / umap

Uniform Manifold Approximation and Projection

BSD 3-Clause "New" or "Revised" License

7.38k stars 803 forks source link

Parametric UMAP : "n_training_epochs=20" results in training for 200 epochs #546

Open erl-j opened 3 years ago

erl-j commented 3 years ago

Hello!

First, I'd like to thank all contributors for this great resource. Now onto my issue:

I'm using parametric umap with the following parameters

reducer = umap.parametric_umap.ParametricUMAP(n_components=2,min_dist=1,parametric_reconstruction=True,n_training_epochs=20)

However, this results in the network training for 200 epochs.

Thank you!

walid0925 commented 3 years ago

@erl-j This also happens for me. I think this is due to the default loss_report_frequency being set to 10. You can see that the .fit call uses loss_report_frequency* n_training_epochs here

for your purposes, you should be able to pass loss_report_frequency=1 to the init

At a high level, I think this is expected behavior (meaning, the actual number of epochs is still correct), though it is deceiving from the keras logs. However, this means that the EarlyStopping callback needs to be adjusted accordingly also, if being used.

It might make sense to change the default loss_report_frequency to 1 and n_training_epochs to 10. What do you think @lmcinnes ?

lmcinnes commented 3 years ago

That may make some sense. I would want to consult with @timsainb on that however, as he knows a lot more about that code than I do.

timsainb commented 3 years ago

@erl-j , the '200' epochs is keras semantics. @walid0925 is right, I set the default loss_report_frequency to 10 so we would see the loss reported 10x per epoch. This means that those 200 epochs are actually 20 epochs, split into 10 sub epochs.

I can see how it would be more intuitive to set the default of loss_report_frequency to 1. Right now the default n_training_epochs is set at 1, so the loss would only be reported once by default.

Increasing the n_training_epochs will increase the default number of epochs it takes to train, so it will make training longer.

Generally I'm all for changing the default parameters to what makes the most sense to the most people.

timsainb commented 3 years ago

https://github.com/lmcinnes/umap/issues/521 brings up the same issue and also suggests we change loss_report_frequency for similar reasons.

I think setting loss_report_frequency to default to 1 would reduce confusion, then if you want to see loss reported sub-epoch, you can just increase loss_report_frequency.

I read through the proposed solutions linked in https://github.com/lmcinnes/umap/issues/521 but didn't see anything that I think would resolve the semantic confusion of seeing e.g. 200 epochs printed in the keras.fit function, and thinking your network was training for 200 epochs instead of 20. Maybe @dylanking42 has a suggestion?