pandeyshubham25 / pagerank

0 stars 0 forks source link

Choice of alpha and number of iterations #9

Open pandeyshubham25 opened 2 years ago

pandeyshubham25 commented 2 years ago

I think we should fix a value of alpha and number of iterations that we want for our experimentation, at least for the serial and trivial parallel implementations. Some texts suggest the value of alpha to be 0.5 or 0.85. But I am not sure of the number of iterations as it largely depends on the graph we are dealing with. I suggest selecting a safe upper bound, which again requires some knowledge.

How do you think we solve this problem ?

NMerz commented 2 years ago

We could run some tests for convergence across our smaller datasets while running serial since we have to do that anyway for our current plan. (Or parallel, but that's 5% harder to write). If we wanted interesting data about if convergence differs between methods, we might be better off with an underestimated/unfinished iteration count (Or one safe iteration and all others compared). I'm happy with either alpha. I think I chucked .85 in the test code as a placeholder.

pandeyshubham25 commented 2 years ago

Lets go with 0.85 for alpha then. As for #iterations, I am more worried about correctness at the end of iterations since this is the output we would use as source of truth for our main model (yet to be built). Lemme do some more research on how people approach at this number or simply pull this value from some public repos.

NMerz commented 2 years ago

Pulling another approaches is fine. However, if we just need it for the source of truth, we can run the source of truth until convergence and then reuse the same iteration count for everything else.