Sparse HALS stopping criterion

jpfairbanks / Matfacgrf

Matrix Factorization for Graph Analysis

Other

0 stars 1 forks source link

Sparse HALS stopping criterion #7

Open jpfairbanks opened 10 years ago

jpfairbanks commented 10 years ago

We need to discuss the fact that computing the objective function vecnorm(A-WH) where vecnorm is the frobenius norm takes mn + mnk time. Which is a really long time to test for convergence. If we are just looking to see if W and H are changing then we can use two auxiliary arrays to store oldW and oldH and then compute vecnorm(W-oldW) + vecnorm(H-oldH) We know that if W,H are not changing, then the algorithm is converged. But I don't think that we can say that the algorithm is monotonic in this stoping criterion. I suppose that we don't need the stopping criterion criterion to be monotonic for any reason.

ramkikannan commented 10 years ago

Look at section 6.1 in page 305-306 of the paper http://dx.doi.org/10.1007/s10898-013-0035-4. It considers gradient of the matrices W and H which takes only mxk and nxk. The computation can be significant but memory footprint is small. Also, we can start computing the objective function after 20 iterations and for every 5 iterations.

jpfairbanks commented 10 years ago

Did I read this correctly to say that we should perform the following?

pgW = vecnorm((2*W*H'*H - 2A*H)W[W.>0])
pgH = vecnorm((2H*W'*W - 2*A'*W)H[H.>0])
deltas[i] = psW + pgH
if delta[i] <= eps * delta[0]
   converged = true
end

But we will compute W'*A' and W'*Wwhen we update H in the next iteration and H*H' and A'*H when updating W. So we have most of the ingredients to compute the stopping criterion at the top of the next loop right? Which implies that we should move the convergence check up to the top of the loop and not evaluate these terms twice.

ramkikannan commented 10 years ago

gradW = W_(H_H') - A_H'; gradH = (W'_W)_H - W'_A; pGradW = gradW(gradW < 0 | W > 0); //Check the first equation in pg. 306. Choose if gradW < 0 or W > 0. pGradH = gradH(gradH < 0 | H > 0); deltai=sqrt(norm(pGradW,'fro')^2+norm(pGradH,'fro')^2)

Hope I am clear.

jpfairbanks commented 10 years ago

Ok I think these are equivalent by squaring the definition of epsilon. I will get it implemented in hals.jl after I write the pagerank and temporal correlation we discussed today.

ramkikannan commented 10 years ago

pgW = vecnorm((2_W_H'_H - 2A_H)""W[W.>0]"")

What does this do? Is it equivalent to multiplying by non-zero elements of W?
Where do you filter the matrix such that (a) pgW has only its negative entries (b) entries where W is positive.

If you think the julio code is equivalent of above matlab code, close this item.

Ramki.

jpfairbanks commented 10 years ago

So the entries of gradW = W*(H*H') - A*H' could be positive on places where the entries of W are positive and we need to filter them out as well?

ramkikannan commented 10 years ago

Yes.