About structure of the Transformer-based model mentioned in the article

LinktoGG commented 1 day ago

Greetings！Thanks for your work about "On Embeddings for Numerical Features in Tabular Deep Learning".

I'm having difficulty understanding the code that implements this functionality in train4.py. My understanding is that the class NonFlatModel in train4.py implements a Transformer-based model. However, I don't fully understand the specific model structure within it.Could you please elaborate on the specific structure of the Transformer-based model mentioned in the article？

For example, could you explain how transformer-PLR works on a regression task dataset (such as the California Housing dataset), and what layers it comprises? 1.Is this part（the class NonFlatModel in train4.） used to implement the Transformer-based model? I'm not quite clear about the functions like cls_embedding(x) and others. 2.Here's my personal understanding, please correct me if I'm wrong. I'm guessing that the Transformer model structure for regression tasks roughly includes an embedding layer and a transformer-encoder layer (is it an encoder-only structure?). After the transformer layer, what module did you use to complete the regression task? Is it an MLP (one or multiple layers?)?

I'm looking forward to your reply. Thank you.

LinktoGG commented 1 day ago

If possible, could you please explain the structure of the model and the process of completing a regression task using Transformer-PLR to predict the California Housing dataset as an example? This would be of great help to me as a beginner.Thank you very much.

Yura52 commented 1 day ago

Hi! First, I recommend taking a look at the paper about FT-Transformer: Revisiting Deep Learning Models for Tabular Data. That paper contains a helpful explanation and illustration of how Transformer is applied to tabular data. That, in particular, covers the CLS embedding and the prediction head.

In the paper about embeddings, "Transformer" is FT-Transformer where the linear embeddings are replaced with the proposed non-linear embeddings, such as piecewise-linear embeddings or periodic embeddings. So instead of train4.py, I recommend taking a look at the FTTransformer class of the rtdl_revisiting_models package: link. It stores the linear embeddings for continuous features as cont_embeddings: link. To replace them with advanced non-linear embeddings:

import rtdl_revisiting_models
import rtdl_num_embeddings

model = rtdl_revisiting_models.FTTransformer(...)
model.cont_embeddings = rtdl_num_embeddings.PeriodicEmbeddings(...)

The model implemented in train4.py is basically the same.

LinktoGG commented 1 day ago

Thank you for your response and explanation, but I still have some questions. In the paper On Embeddings for Numerical Features in Tabular Deep Learning, you mentioned that “Transformer-L” is equivalent to FTTransformer [13]. 1.You mean that in the article about embeddings（On Embeddings for Numerical Features in Tabular Deep Learning）, all the transformer backbones are based on FTTransformer rather than the original Transformer? 2.models like Transformer-PLR or other transformer variants are actually modifications of FT-Transformer? In other words, the use of PLR or other non linear embedding replaces the linear embedding in FT-Transformer?

Yura52 commented 16 hours ago

(1) The backbone of FT-Transformer is (almost) exactly the original Transformer, there are only two small details to be aware of:

The so called "pre-norm" variant is used.
The first prenormalization in the first Transformer block is omitted.

(2) Technically, yes.

yandex-research / rtdl-num-embeddings

About structure of the Transformer-based model mentioned in the article #24