senderle / talediff

A prototype word embedding model based on type-level differential operators.
MIT License
0 stars 0 forks source link

Revisit scaling #2

Open senderle opened 6 years ago

senderle commented 6 years ago

To get a pointwise mutual information matrix, take the hessian at 1, 1, 1..., scale it by a diagonal 1/word freq matrix on the left and the right, multiply the whole thing by the number of sentences, and take the logarithm.

Given how well this already works, there seems to be no reason to take the logarithm, and the previous step is just a constant multiplication. So the only part that might still matter is the two scaling multiplications. Add an option to do that. Should be as simple as left multiplying both the hessian and the projection vectors by the scaling matrix. (The left mul over the projection vectors becomes a right mul when the projection mul is executed.)

senderle commented 6 years ago

I can't remember where I left of on my experiments with Jacobian scaling but it would be interesting to do the same thing there, i.e. do both a left and a right mul. I don't have a totally clear idea what that would mean tbh but maybe it's worth trying? It would also be kind of hard to do correctly because it involves inserting the Jacobian multiplication between the vector and the projection, but the Jacobian is calculated at the same time as the hessian and isn't done till that's complete. Since this calculates the whole thing piecemeal, that will require two passes through the data.

It's also just now occurring to me -- how dumb am I? -- that the Jacobian is just a word count vector over the whole corpus!!! Seriously, dang. So uh... these two approaches are kind of equivalent...