Closed hongjianyuan closed 4 years ago
Hi, I believe the most straight forward solution would be to keep the original architecture, and only change the output module. Currently, I have a linear transformation followed by a sigmoid activation, I would start by simply replacing the activation with a softmax, and see from there.
I currently want to input 250 features, segment them, and output the categories of these 250 features, so I just need to change the output module to softmax?
Yes, set d_input=250
, d_ouptut
to the number of class, and replace the sigmoid by a softmax, you should have a functional segmentation algorithm.
Thank you very much
是的,设置
d_input=250
,d_ouptut
上课的人数,并通过SOFTMAX更换乙状结肠,你应该有一个功能分割算法。
If it is the category of these 250 features, the output is like 250*4
Hi @maxjcohen , thanks for your great repo!
Is it possible to change the transformer to understand sequence classification (many-to-one)?
Hi, nothing is stopping you from setting d_output = 1
, in order for the Transformer to behave as a many-to-one model. In practice, every hidden state will be computed with a dimension d_model
, and later aggregated in the last layer to output a single value. Note that this process in different from how traditional architectures, such as RNN based networks, handle many-to-one predictions.
Thank you for your reply @maxjcohen ! How exactly do you mean its different? From the way a RNN-model would take hidden states as further input?
RNN carry a memory-like hidden state across time steps, while the Transformer has no notion of memory and compute time steps in parallel instead.
How should I change the transformer to be applied to classification, such as seq2seq (many to many), how should I change it in the last layer of the model