joewandy / hlda

Gibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model
GNU General Public License v3.0
147 stars 38 forks source link

Repeatability Problem in Python 3 Virtual Environment #9

Open cksajil opened 4 years ago

cksajil commented 4 years ago

Hi, I exported the 'bbc_test.ipynb' into a '.py' file and tried to reproduce the result. I could get the result in Anaconda Python 3.7.3 and in a virtual environment of Python 3 (with installing necessary libraries). But setting seed value doesn't ensure repeatability in the case of the virtual environment. On the other hand, if we set random seed, the Anaconda version ensures reproducibility 100%.

Anaconda Python in my laptop uses 'Numpy version 1.17.3' whereas while working with virtual environments it takes the latest version '1.17.4' which was released recently. I understand the latest version of Numpy has some bug fixes regarding random number generation reported against Ver 1.17.3.

I tried to log the values in both Anaconda and virtual environment and observed the change starting from line 209 of 'sampler.py'. I am not sure the cause for different outputs is because of some bug in the new Numpy version or do we have to make some modification in 'sampler.py' to ensure repeatability while working with virtual environments ?.

Thanks in advance,