shenweichen / DeepCTR

Easy-to-use,Modular and Extendible package of deep-learning based CTR models .
https://deepctr-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
7.58k stars 2.21k forks source link

Does the deepfm/xdeepfm paper author train the whole criteo dataset? #173

Closed fancyerii closed 4 years ago

fancyerii commented 4 years ago

The criteo training set(http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/) has 45,000,000 examples. I found many github deepfm codes only sample small amount of training examples. Could you tell me whether the author of the paper using how many training data to achieve the reported roc and logloss?

fancyerii commented 4 years ago

I have implemented a keras.utils.Sequence and my own LabelEncoder and MinMaxScaler. I have trained the whole 4,500,000 criteo dataset and got 0.795 auc. It is a little bit distance from the 0.805 auc of the orignal paper, maybe I need fine tuning some hyperparameters.

I have wrote a blog(in Chinese) here: http://fancyerii.github.io/2019/12/19/deepfm/

ucasiggcas commented 4 years ago

大佬有没有XDeepFM的实现啊?能否给个链接啊,多谢

CeoiZidung commented 2 years ago

I have implemented a keras.utils.Sequence and my own LabelEncoder and MinMaxScaler. I have trained the whole 4,500,000 criteo dataset and got 0.795 auc. It is a little bit distance from the 0.805 auc of the orignal paper, maybe I need fine tuning some hyperparameters.

I have wrote a blog(in Chinese) here: http://fancyerii.github.io/2019/12/19/deepfm/

Hello bro! Would you mind sharing your hyper-parameter setting for criteo data? I have totally no ideas how to set them to improve AUC. Thank you