Open Blank-z0 opened 1 year ago
Thank you for your careful suggestions on the preprocessing of Churn dataset! Actually, we also found some data mistakes during experiment, including repeated columns (which may lead to a wrong n_num_features
), or mentioned unreasonably processed features (like "gender" {0,1} as a numerical feature). The same data files can be acquired from data sources of Yandex's FT-Transformer and Numerical Embeddings, we found the used Churn data files have treated "gender" as a numerical feature (same as other data mistakes appeared in their provided files), thus for a fair comparison we followed their settings in the experiment.
Personally, I do agree the "gender" feature is a categorical one.
Thank you for your replay, I got it. I'll try doing some experiments to preprocess some features that may be categorical features.
Hi there, after reading your papers and reproduce of your codes on dataset Churn Modelling, I have some questions. Since kaggle have listed all features and their meanings, I think the classification of numerical and categorical features in the paper is not reasonable enough. I think these features should be categorical features instead of numerical features:
By the way, I downloaded the datasets from the link you provided, in dataset Churn, the
info.json
mistakenly wrote "n_num_features" as 10 (the correct one should be 9).