A question about the AUC reduction after the version update

ArnoldGaius commented 3 years ago

Describe the question I have tried the 0.2.1 and 0.2.3 versions of DeepCTR respectively, but the same data set and parameters obtained a large difference in AUC and Loss in different versions of DeepCTR.In this experiment, I tried FibiNet and NFM respectively. The problem I encountered was that the AUC of DeepCTR version 0.2.3 was lower than version 0.2.1

Additional context I have uploaded two versions of the project to Baidu Netdisk.The test data is a 25M Criteo data set randomly selected. After downloading, run the run.py of the two projects to see different AUC running results.This is the download link:

链接: https://pan.baidu.com/s/1191pHvL3wMaCM5TsAo4jgA 提取码: hexw

Please download and troubleshoot this issue, thank you again for your contribution.

Operating environment

python version 3.6.5
torch version 1.6.0
deepctr-torch version0.2.1&0.2.3

zanshuxun commented 3 years ago

I'm trying to reproduce your problem. But I got the error "one of the variables needed for gradient computation has been modified by an inplace operation" when I use torch 1.6.0 to run deepctr-torch 0.2.1. What's the torch version when you run deepctr-torch 0.2.1? (deepctr-torch 0.2.1 & torch>=1.5 will cause this error, which has been fixed in v0.2.2. So I think you might run deepctr-torch 0.2.1 with another torch version)

zanshuxun commented 3 years ago

In addition, could you provide a set of parameters? From your files, I notice that the parameters in 0.2.1 and 0.2.3 is different (batch_size, dnn_hidden_units, l2_reg_dnnand dnn_dropoutare different). 0.2.1: )VQXT378X6}I6 1VAC5XP{P

0.2.3: O{Y)GDH`Z%${R$)Z3RWUB7

ArnoldGaius commented 3 years ago

I'm trying to reproduce your problem. But I got the error "one of the variables needed for gradient computation has been modified by an inplace operation" when I use torch 1.6.0 to run deepctr-torch 0.2.1. What's the torch version when you run deepctr-torch 0.2.1? (deepctr-torch 0.2.1 & torch>=1.5 will cause this error, which has been fixed in v0.2.2. So I think you might run deepctr-torch 0.2.1 with another torch version)

Thank you for your reply.The Version I used to run v0.2.1 is torch 1.4.0. I am sorry for the inconsistent parameters in the file I uploaded.In fact, batch_size, dnn_hidden_units, l2_reg_dnn and dnn_dropout should be adjusted to 1024, (400,400,400), 0, 0.3, just adjust according to the parameters of version 0.2.1

zanshuxun commented 3 years ago

:joy: This is because you set a smaller lr (1e-4) in your v0.2.3 folder： remove this and use the default lr(1e-3), you will get a normal performance.

zanshuxun commented 3 years ago

Besides, it not recommended to set learning rate in basemodel.py, you can set lr more easily in main file by:


from torch.optim import Adam
...
    model.compile(Adam(model.parameters(),1e-4), "binary_crossentropy",
                  metrics=["binary_crossentropy", "auc"], )

This is listed in https://deepctr-torch.readthedocs.io/en/latest/FAQ.html#set-learning-rate-and-use-earlystopping

ArnoldGaius commented 3 years ago

Besides, it not recommended to set learning rate in basemodel.py, you can set lr more easily in main file by:
from torch.optim import Adam
...
    model.compile(Adam(model.parameters(),1e-4), "binary_crossentropy",
                  metrics=["binary_crossentropy", "auc"], )
This is listed in https://deepctr-torch.readthedocs.io/en/latest/FAQ.html#set-learning-rate-and-use-earlystopping

Yes, the learning rate of the adam optimizer you mentioned is textbook-like, and I am sorry for my confusing coding style. However, I have modified the learning rate to 1e-3 in the code (Setting the learning rate to 1e-4 will make it easier to fall into a local minimum, in version 0.2.3, the learning rate setting of 1e-3 will make the model quickly overfit), and the gap in AUC still exists. I have tried each of them five times. As you can see from the picture below, the 0.2.3 version of the model fits very quickly and will be completed in the first epoch, while the 0.2.1 version of the model learns more slowly, the loss and AUC obtained between the two version of model have huge difference.

pig023 pig021

zanshuxun commented 3 years ago

Cause

This is caused by the L2 regularization. Set all the l2_reg parameters (l2_reg_linear, l2_reg_embedding, l2_reg_dnn) to 0 in v0.2.3, you will get the same performance with v0.2.1.

In fact, regularization only works from v0.2.2, where we fixed the bugs about regularization. In previous versions, the reg_loss is computed only once by the initial parameters, which means that reg_loss is just a constant. This bug can be found here.

In v0.2.2, we fixed this bug by storing the necessary parameters in self.regularization_weight and calculate reg_lossin each iteration: https://github.com/shenweichen/DeepCTR-Torch/blob/bc881dcd417fec64f840b0cacce124bc86b3687c/deepctr_torch/models/basemodel.py#L371-L386 https://github.com/shenweichen/DeepCTR-Torch/blob/bc881dcd417fec64f840b0cacce124bc86b3687c/deepctr_torch/models/basemodel.py#L228-L230

Solution

However, it's strange that the model performance reduces when we actually use L2 regularization. Perhaps your dataset (122k samples) is too small. So I use a larger dataset to conduct an experiment. I run FiBiNET with the sample parameters on a subset of avazu dataset (data in the first 3 days, 13.3 million samples in total, 9.46M for training, 0.77M for validation. 3.1M for test). Using L2 reg receives significant improvement:

no l2 reg : l2 reg = 1e-4 :

I suggest you to use more samples of criteo. Besides, I can provide the necessary codes to you if you're interested in my experiment.

ArnoldGaius commented 3 years ago

Cause

This is caused by the L2 regularization. Set all the l2_reg parameters (l2_reg_linear, l2_reg_embedding, l2_reg_dnn) to 0 in v0.2.3, you will get the same performance with v0.2.1.

In fact, regularization only works from v0.2.2, where we fixed the bugs about regularization. In previous versions, the reg_loss is computed only once by the initial parameters, which means that reg_loss is just a constant. This bug can be found here.

In v0.2.2, we fixed this bug by storing the necessary parameters in self.regularization_weight and calculate reg_lossin each iteration: https://github.com/shenweichen/DeepCTR-Torch/blob/bc881dcd417fec64f840b0cacce124bc86b3687c/deepctr_torch/models/basemodel.py#L371-L386

https://github.com/shenweichen/DeepCTR-Torch/blob/bc881dcd417fec64f840b0cacce124bc86b3687c/deepctr_torch/models/basemodel.py#L228-L230

Solution

However, it's strange that the model performance reduces when we actually use L2 regularization. Perhaps your dataset (122k samples) is too small. So I use a larger dataset to conduct an experiment. I run FiBiNET with the sample parameters on a subset of avazu dataset (data in the first 3 days, 13.3 million samples in total, 9.46M for training, 0.77M for validation. 3.1M for test). Using L2 reg receives significant improvement:

no l2 reg : l2 reg = 1e-4 :

I suggest you to use more samples of criteo. Besides, I can provide the necessary codes to you if you're interested in my experiment.

Thank you for your help. I cannot know this bug that was fixed in version 0.2.2 without your notification. I have modified l2 according to your prompt, and the result is the same as you said. I appreciate your willingness to share the necessary code. My email is jiangcmd@foxmail.com Look forward to your favourable reply. Sincerely Bain

zanshuxun commented 3 years ago

Experiment codes on avazu dataset can be found in the experiment branch: https://github.com/shenweichen/DeepCTR-Torch/tree/experiment Follow the steps in README.md. Please feel free to contact me if you have any questions.

shenweichen / DeepCTR-Torch