Open dorukkilitcioglu opened 6 years ago
I don't think our current approach really works for any reasonably-sized dataset, and moving to minibatch gradient descent might make everything work. So this is now a priority.
An implementation of this is in the minibatch branch. Until we know if this is even necessary, we're leaving it out.
For the record, our issue in the second commit above ended up being something else. Still leaving this here, because it's an idea worth exploring.
For now, we're doing a full gradient descent over the whole dataset (each thread does one user). That takes a lot of time, and increases the time till convergence. I think we can easily change this to minibatches using clever indexing, where we pass in a start and end indices, and only compute the gradients for those users. This will hopefully make the item factors (Q) converge faster, which would make the user factors (P) also converge faster.
Honestly, after coding it this way, I can kinda see the appeal of ALS.