shenweichen / DeepCTR-Torch

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
https://deepctr-torch.readthedocs.io/en/latest/index.html
Apache License 2.0
3.02k stars 705 forks source link

Making DIEN dataset #237

Open Jeriousman opened 2 years ago

Jeriousman commented 2 years ago

Describe the question(问题描述) As I am processing data to use DIEN model, I reckon that data should have some different format compared to DeepFM due to the user behavior sequence list if I am correct. Because we will have sequence behavior (item user click history), I guess the dataset should be like one user one row? For example, it would be like below.

user_id      item_sequence    target_ad
0             [20, 30, 22]     3
1             [11, 45, 2]       10
2             [77, 35, 64]     4
3             [20, 30, 22]     7
4             [20, 30, 22]     16
5             [20, 30, 22]      1

But in DeepFM case, we do not use user behavior sequence, so many rows can have same user ID I guess? The example of what I am saying is as below: (user 1 and 5 have multiple rows)

user_id      clicked_item_id    target_ad
1             5                                3
1             6                               10
1             5                                4
5             8                                7
5             11                             16
9             2                                1

So in general, DIEN dataset would have number of row = number of user in this case whereas DeepFM can have arbitrary number of row as long as data exists?

And as we have to put target ad according to DIEN paper, can I take out the last sequence of original item_sequence and put it as target ad? Because with sequence history, the last item sequence should be predicted if it was classification problem.