Closed hanfu closed 4 years ago
In short, it's for transferring the feature IDs to accumulated ones, thus the model can get the embedding once time, in all fields.
Thanks for your reply! if I understand correctly, it combines multiple fields of features to one big field, by brute-force accumulating field values. Ex. two two-column fields [(1,2),(1,1)] becomes [1,2,3,3]. But how does it help faster embedding? Is there any paper on this? Thanks again for your help!
Yes. In the codebase I referred, the embedding lookings up are running in a "for loop" for each field, thus I rewrite it and look up the embedding unitedly. That's just a little optimization in code implementation.
Hi, thanks again for your reply. what is the codebase you refer to? Also my question is now ordinal encoding + embedding. As far as I understand, embedding uses one-hot encoding to look up embedded vectors for features. How does ordinal encoding work with embedding? Thanks in advance for your patience. Really appreciated.
For the original codebase, you can refer to https://github.com/nzc/dnn_ctr/blob/master/model/DeepFM.py#L202 . In PyTorch, the nn.embedding forward works by ordinal encoding, refer to https://pytorch.org/docs/stable/nn.html#torch.nn.Embedding (see examples), though exactly one-hot encoding works inside its implementation.
ahhh. everything now makes sense. Thank you very much!
Hi, can you elaborate on the " fast version cateNN" approach? How does it work?