kmkolasinski / deep-learning-notes

Experiments with Deep Learning
1.35k stars 270 forks source link

Question about 2017-09-Poincare-Embeddings #12

Closed joelkuiper closed 4 years ago

joelkuiper commented 5 years ago

Hey,

I may be way out of line here, but I stumbled across your talk "Poincaré Embeddings for Learning Hierarchical Representations" and I was wondering if you could shed some light on a related problem. I've described it in more detail here https://datascience.stackexchange.com/questions/56889/hyperbolic-coordinates-poincar%c3%a9-embeddings-as-the-output-of-a-neural-network

But basically I have an encoder (for some some sentences) and an output on a Poincaré ball that was pretrained using the Gensim implementation. I have supervised training data for that mapping. The goal is to use the encoder to predict points on the ball, basically it's an entity linking task. So an encoded fragment like "the river bank" would map to the "river bank" point in a hyperbolically embedded ontology (like WordNet). However I can't seem to get it to work, would really love to hear your ideas on this :-)

kmkolasinski commented 5 years ago

Hi, sorry for late reply. As far as I understand you want to create a model which will produce embeddings in the hyperbolic space and then you want to somehow define a cost function which will work in this space and then you want to optimize network parameters to minimize your cost.

Basically, I would try to define first some toy problem e.g. something similar to this in my notebooks. Then I would give a try the method proposed in the last paper you linked in your SO question.

If I understand correctly I would create such thing:

def hypernet(X):
   Vx = normalize(PsiDirModel(X))
   Px = sigmoid(PsiNormModel(X))
   Hx = Px * Vx
   return Hx

hx = hypernet(x)

hx - will contain embeddings re-parametrized to Rn-ball. You can then use for example Poincare disk formula for distance between two embeddings. And minimize the distance between nearest word. If you get NaNs you should definitely investigate the root of them. If gradients are too large you can try to clip them, change learning rate to smaller values or use plain SGD optimizer. Note, that you don't need to apply retraction operation on your weights. Since these weights are not restricted to be in hyperbolic space. In my example I was working with embedding vectors which I wanted to be in hyperbolic space, hence during optimization I have to project these embeddings onto unit Rn-ball. In the example above retraction operation is done with normalize and sigmoid functions. You can also take a look into a Hinton Capsules they introduces there somehow similar vector normalization. Maybe their parametrization will be more stable.