jyansir / t2g-former

(AAAI 2023 oral) Original implementation and experiment results of T2G-FORMER
MIT License
32 stars 3 forks source link

Questions about dataset Churn #4

Open Blank-z0 opened 1 year ago

Blank-z0 commented 1 year ago

Hi there, after reading your papers and reproduce of your codes on dataset Churn Modelling, I have some questions. Since kaggle have listed all features and their meanings, I think the classification of numerical and categorical features in the paper is not reasonable enough. I think these features should be categorical features instead of numerical features:

By the way, I downloaded the datasets from the link you provided, in dataset Churn, the info.json mistakenly wrote "n_num_features" as 10 (the correct one should be 9).

jyansir commented 1 year ago

Thank you for your careful suggestions on the preprocessing of Churn dataset! Actually, we also found some data mistakes during experiment, including repeated columns (which may lead to a wrong n_num_features), or mentioned unreasonably processed features (like "gender" {0,1} as a numerical feature). The same data files can be acquired from data sources of Yandex's FT-Transformer and Numerical Embeddings, we found the used Churn data files have treated "gender" as a numerical feature (same as other data mistakes appeared in their provided files), thus for a fair comparison we followed their settings in the experiment.

Personally, I do agree the "gender" feature is a categorical one.

Blank-z0 commented 1 year ago

Thank you for your replay, I got it. I'll try doing some experiments to preprocess some features that may be categorical features.