Hi, I have read your paper recently. It is an interesting work and outperforms existing methods.
But I have some problems ( may be silly :) ).
Which GBDT method was used in the experiment? My guess is CatBoost, which focuses on category features.
Why LightGBM is not added to the reference to GBDT? LGBM also has special treatment for feature cross.
The embedding of category features requires a very large number of parameters (d x m). Considering the number of parameters, the improvement is not particularly large compared to MLP.
Can you give TabTransformer's running time? The overhead of such a method should be much greater than gbdt-based methods.
Hi, I have read your paper recently. It is an interesting work and outperforms existing methods. But I have some problems ( may be silly :) ).