pumpikano / tf-dann

Domain-Adversarial Neural Network in Tensorflow
MIT License
628 stars 224 forks source link

total loss #12

Closed jakc4103 closed 7 years ago

jakc4103 commented 7 years ago

Hi First of all, thanks for implementing this. It's really awesome! I'm currently modifying the codes to run it on OFFICE data set. I found out that there is a small difference on the definition of total loss.

In the original paper, the total_loss is defined as : predict_loss + lambda*domain_loss .

In the code, seems that the lambda term is missing. I think that's one of the reasons that sometimes the total_loss would go crazy to NaN

Goldit commented 7 years ago

Hi @jakc4103 ,

I think lambda is the l parameter in the flip_gradient feat = flip_gradient(self.feature, self.l)

l is calculated as follows (see the Training loop in the MNIST-DANN.ipynb)

Adaptation param and learning rate schedule as described in the paper

        p = float(i) / num_steps
        l = 2. / (1. + np.exp(-10. * p)) - 1
        lr = 0.01 / (1. + 10 * p)**0.75
yenlianglintw commented 7 years ago

I have the similar problem as @jakc4103 when using AlexNet as feature network. The total_loss always goes to NaN for default adaptation param and learning rate. How to set the adaptation parameter and learning rate for handling the loss explosion ?

jakc4103 commented 7 years ago

@Goldit sorry for my misunderstanding. You are right, lambda is a parameter in flip_gradient.

@yenlianglintw I think you can try to set the learning rate lower, or at least lower the learning rate at the pre-trained Alexnet layers. This should somehow solve the NaN loss problem. But in my experience, sometimes the training won't converge for some reason I did not know. I think this probably is the reason why the original paper did not present all the 6 adaptation tasks on Office dataset. BTW, another choice is don't fine-tune the parameters of the pre-trained model, just tuned the additional layers you added. this allows you to use a larger learning rate at training.