Closed ari-ruokamo closed 3 months ago
Perhaps the learning rate could be reduced to 0.0003 or lower.
Thanks.
Yes indeed lowering LR helps, might need to go even lower as convergence looks yet a bit too fast, maybe. Grad persists at inf but I guess it is normal as stated here https://github.com/starrytong/SCNet/issues/11.
What might cause the training to go into this state pretty quickly after few epochs?