lzamparo / embedding

Learning semantic embeddings for TF binding preferences directly from sequence
Other
0 stars 0 forks source link

Implement doc2vec style factor augmentation #9

Open lzamparo opened 7 years ago

lzamparo commented 7 years ago

Implement a code vector for each document (i.e factor in this case) to be learned and concatenated with the codes for each word to learn the embeddings.

Incredibly, there isn't an equation in the original paper describing how to do this (which is shocking), but there are other implementations. This one seems readable, and this page actually has derivations, which will help augment the model I'm currently working with.

Without some probabilistic interpretation which would allow for the decoding of a window without an associated document, this extension seems unlikely to be useful. But it should be informative as to how much separation I can get just by including factor information in the generation of the code words.

I might still be able to use the code-words learned in this way as some empirical Bayes-style prior in a more principled model.

lzamparo commented 7 years ago

Another thought about this: