kkdey / CountClust

A R package for Grade of Membership model and Visualization of counts data:
31 stars 11 forks source link

Topics() results vary by tolerance level for the cell cycle genes. How do we choose tolerance levels? #2

Closed jhsiao999 closed 8 years ago

kkdey commented 9 years ago

The "tol" parameter is a convergence tolerance parameter in Matt's topic model function. The optimization process stops if the relative log likelihood increase from one step to the next becomes less than tol. So, the smaller we choose tol the better it is for the convergence, but the longer we run. Usually for tol below 0.05, I have not seen much difference in patterns for some of my trial runs. I think the default in his package is 0.1 and in CountClust it is 0.001. However for very large datasets, users are recommended to use a bigger tol just so that he gets the output faster

jhsiao999 commented 9 years ago

My concern is that the results may change with different tolerance levels. Can we confirm this?

kkdey commented 9 years ago

The best practice is to run for multiple tolerance levels. The lesser the tolerance level the better the convergence, however my belief is that the results will not change significantly if the tolerance level is set in the range of 0.01 or lower. Mainly at that level there are only small fluctuations that take place in the Structure plot which does are not visually distinctive and do not change the broad patterns