Closed huashan closed 5 years ago
Apologies for the slow reply - for some reason I haven't been receiving notifications.
At the moment, yes, you'd have to convert it to a dense format. If you can deal with reducing the vocabulary (e.g. by discarding rows and columns corresponding to the least-frequently occurring tokens) then that's great. If not, then you're in trouble: a key step is computing the difference between the log co-occurrence matrix (which could be sparse) and the outer product of W
and C
(the learned word and context vectors). (See https://github.com/roamanalytics/mittens/blob/master/mittens/np_mittens.py#L109) Both W
and C
have to be dense, and so their outer product (which is the same dimension as the co-occurrence matrix) will be dense too. That means that if the co-occurrence matrix doesn't fit in memory, you'll hit that memory limit regardless of how you represent it.
Since we eventually only need gradients that are the same shape as W
and C
(V*d
), it might be possible to rewrite the code to compute those gradients in batches, but it'll be slow(er).
I stored the co-occurrence matrix in MatrixMarket format and read into python with
mmread()
, do I have to convert it as dense matrix (which is impossible for memory issue)? Or does Mittens handle with this format?