reczoo / BARS

BARS: Towards Open Benchmarking for Recommender Systems https://openbenchmark.github.io/BARS
Apache License 2.0
352 stars 57 forks source link

click history sequence feature in taobao dataset ? #18

Closed Kailianghu closed 1 year ago

Kailianghu commented 1 year ago

Hello, in taobao dataset it contain raw behavior dataset。 According to L36 - L56 https://github.com/openbenchmark/BARS/blob/master/ctr_prediction/datasets/Taobao/Taobao_x1/split_taobao_x1.py#L36, it create click history sequence from raw_sample dataset, not use the raw behavior dataset。

xpai commented 1 year ago

Sorry for the late reply. We use the raw_sample data to create historical sequences since we would like to use item IDs in the sequence. The raw behavior data have no such information. But after some experiments, we found that creating behavior sequences in such a way for Taobao data does not work for DIN, i.e., there is no gain when using target attention. Recently, we have refined the data preprocessing and denote it as "taobaoad_x1". Please check the code for your reference: https://github.com/openbenchmark/BARS/blob/master/datasets/Taobao/TaobaoAd_x1/convert_taobaoad_x1.py

In this version, we obtain good performance gains for DIN. DCN: [Metrics] gAUC: 0.573908 - AUC: 0.648805 - logloss: 0.193040 DIN: [Metrics] gAUC: 0.576459 - AUC: 0.652399 - logloss: 0.192445