Open jpfairbanks opened 10 years ago
Look at section 6.1 in page 305-306 of the paper http://dx.doi.org/10.1007/s10898-013-0035-4. It considers gradient of the matrices W and H which takes only mxk and nxk. The computation can be significant but memory footprint is small. Also, we can start computing the objective function after 20 iterations and for every 5 iterations.
Did I read this correctly to say that we should perform the following?
pgW = vecnorm((2*W*H'*H - 2A*H)W[W.>0])
pgH = vecnorm((2H*W'*W - 2*A'*W)H[H.>0])
deltas[i] = psW + pgH
if delta[i] <= eps * delta[0]
converged = true
end
But we will compute W'*A'
and W'*W
when we update H
in the next iteration and H*H'
and A'*H
when updating W
. So we have most of the ingredients to compute the stopping criterion at the top of the next loop right? Which implies that we should move the convergence check up to the top of the loop and not evaluate these terms twice.
gradW = W_(H_H') - A_H'; gradH = (W'_W)_H - W'_A; pGradW = gradW(gradW < 0 | W > 0); //Check the first equation in pg. 306. Choose if gradW < 0 or W > 0. pGradH = gradH(gradH < 0 | H > 0); deltai=sqrt(norm(pGradW,'fro')^2+norm(pGradH,'fro')^2)
Hope I am clear.
Ok I think these are equivalent by squaring the definition of epsilon. I will get it implemented in hals.jl after I write the pagerank and temporal correlation we discussed today.
pgW = vecnorm((2_W_H'_H - 2A_H)""W[W.>0]"")
If you think the julio code is equivalent of above matlab code, close this item.
Ramki.
So the entries of gradW = W*(H*H') - A*H'
could be positive on places where the entries of W
are positive and we need to filter them out as well?
Yes.
We need to discuss the fact that computing the objective function
vecnorm(A-WH)
wherevecnorm
is the frobenius norm takesmn + mnk
time. Which is a really long time to test for convergence. If we are just looking to see ifW
andH
are changing then we can use two auxiliary arrays to storeoldW
andoldH
and then computevecnorm(W-oldW) + vecnorm(H-oldH)
We know that ifW,H
are not changing, then the algorithm is converged. But I don't think that we can say that the algorithm is monotonic in this stoping criterion. I suppose that we don't need the stopping criterion criterion to be monotonic for any reason.