I have reimplemented the SVX-loss function in tensorflow and I've made unit tests to verify the correct behaviour of it. However, when I train a Resnet50 architecture from scratch using SVX loss and MS1Mv2 dataset (the purged one) the loss diverges rapidly. Have you experienced something similar? Any recommendation or idea?
I have reimplemented the SVX-loss function in tensorflow and I've made unit tests to verify the correct behaviour of it. However, when I train a Resnet50 architecture from scratch using SVX loss and MS1Mv2 dataset (the purged one) the loss diverges rapidly. Have you experienced something similar? Any recommendation or idea?