Closed jhsiao999 closed 8 years ago
My concern is that the results may change with different tolerance levels. Can we confirm this?
The best practice is to run for multiple tolerance levels. The lesser the tolerance level the better the convergence, however my belief is that the results will not change significantly if the tolerance level is set in the range of 0.01 or lower. Mainly at that level there are only small fluctuations that take place in the Structure plot which does are not visually distinctive and do not change the broad patterns
The "tol" parameter is a convergence tolerance parameter in Matt's topic model function. The optimization process stops if the relative log likelihood increase from one step to the next becomes less than tol. So, the smaller we choose tol the better it is for the convergence, but the longer we run. Usually for tol below 0.05, I have not seen much difference in patterns for some of my trial runs. I think the default in his package is 0.1 and in CountClust it is 0.001. However for very large datasets, users are recommended to use a bigger tol just so that he gets the output faster