Open ManafMukred opened 1 year ago
Hi @ManafMukred, can you share some more details about the model architecture, hyperparameters, and dataset?
@owenvallis I used the same notebook here
, and I tried to use the LARS optimizer also with lr = 0.2 * int(BATCH_SIZE / 256) same as the paper says.
but I get the same error
Thanks for the details, I'll take a look and see if I can repro the issue on my side.
I was trying to explore other algorithms like vicreg using LAMB & LARS optimizer, but in both cases the loss is "nan"
`Epoch 1/200 175/175 - 49s - loss: nan - proj_std: nan - val_loss: nan - val_proj_std: nan - binary_accuracy: 0.1000 - 49s/epoch - 281ms/step
Epoch 2/200 175/175 - 36s - loss: nan - proj_std: nan - val_loss: nan - val_proj_std: nan - binary_accuracy: 0.1000 - 36s/epoch - 203ms/step
Epoch 3/200 175/175 - 36s - loss: nan - proj_std: nan - val_loss: nan - val_proj_std: nan - binary_accuracy: 0.1000 - 36s/epoch - 204ms/step `
any suggestions?