Closed xumwen closed 1 year ago
x
.Thanks for your reply. For example of question 1, what if replace the encoding of [1, 1, (x - b_(t-1)) / (bt - b(t-1)), 0] to [0, 0, (x - b_(t-1)) / (bt - b(t-1)), 0]?
The informal answer is "this encoding does not preserve the notion of order".
Formally, the problem is that without 1
in the left bins you can have two embeddings representing very different values but being very close in terms of L2 distance (eps = 1e-9
):
[eps, 0, ..., 0] vs [0, ..., 0, eps]
Similarly, you can have very close values with very different embeddings:
[0, ..., 1 - eps, 0, 0, ..., 0] vs [0, ..., 0, eps, 0, ..., 0]
I see.
Thanks!
Considering the motivation of using one-hot-like approach PLE to improve model capabilities, there are two questions I don't quite understand. 1."Continuous one-hot" means to replace the 1 of the hit bin with a continuous value while keeping the other bins still 0. For the left bins' 1 seems to be added to hit bin's embedding by MLP which just like Ax+B (where x is the embedding of hit bin, A represents the location in this interval and B represents the sum of left bins' embeddings) 2.If left bins' 1 is built for training low-value more times, can we encode the feature with bi-PLE, which means to concatenate left-to-right PLE and right-to-left PLE to train high-value more times at the meantime.
Hope to hear your understanding. Thanks.