how to use sparse matrix with mittens

Apologies for the slow reply - for some reason I haven't been receiving notifications.

At the moment, yes, you'd have to convert it to a dense format. If you can deal with reducing the vocabulary (e.g. by discarding rows and columns corresponding to the least-frequently occurring tokens) then that's great. If not, then you're in trouble: a key step is computing the difference between the log co-occurrence matrix (which could be sparse) and the outer product of W and C (the learned word and context vectors). (See https://github.com/roamanalytics/mittens/blob/master/mittens/np_mittens.py#L109) Both W and C have to be dense, and so their outer product (which is the same dimension as the co-occurrence matrix) will be dense too. That means that if the co-occurrence matrix doesn't fit in memory, you'll hit that memory limit regardless of how you represent it.

Since we eventually only need gradients that are the same shape as W and C (V*d), it might be possible to rewrite the code to compute those gradients in batches, but it'll be slow(er).

roamanalytics / mittens

how to use sparse matrix with mittens #2