Closed andyyuan78 closed 9 years ago
Hi, Andy,
The model was originally implemented with NLTK 2.x, ever since NLTK 3.0, they changed the interface of freqdist. For quick fix, you may either downgrade NLTK to 2.x version, or go through the code and change all command of "freqdist.inc(sample, count)" to "freqdist[sample] += count".
If you do the following, please create a pull request, and I will merge them in.
Best, Ke
it works.I will PR later
To save the boring fork and pull process, I listed the diff here, only two lines:
ubgpu@ubgpu:~/github/InfVocLDA$ git diff diff --git a/src/fixvoc/inferencer.py b/src/fixvoc/inferencer.py index 2f39cae..4c91052 100755 --- a/src/fixvoc/inferencer.py +++ b/src/fixvoc/inferencer.py @@ -86,11 +86,11 @@ class Inferencer: freqdist.clear();
for word in self._vocab.keys():
freqdist[word]+=self._exp_E_log_beta[k, self._vocab[word]];
i=0;
for key in freqdist.keys():
i += 1;
output.write(key + "\t" + str(freqdist[key]) + "\n");
if top_display>0 and i>=top_display:
break;
diff --git a/src/infvoc/hybrid.py b/src/infvoc/hybrid.py index 8a20d44..cebbf73 100755 --- a/src/infvoc/hybrid.py +++ b/src/infvoc/hybrid.py @@ -477,7 +477,7 @@ class Hybrid: freqdist.clear();
for index in self._index_to_nupos[k]:
freqdist[index]+=exp_weights[k][0, self._index_to_nupos[k][index]]
i = 0;
for key in freqdist.keys():
ubgpu@ubgpu:~/github/InfVocLDA$
I think I have changed every inc() method to be compatible with nltk 3.x. Please let me know if you still run into any problem.
Thanks a lot
Problem resolved.
ubgpu@ubgpu:~/github/InfVocLDA/src$ python -m fixvoc.launch --input_directory=../input/ --output_directory=../output/ --corpus_name=20-news --number_of_topics=10 --number_of_documents=18600 --batch_size=100 successfully load all training documents... successfully load all the words from ../input/20-news/voc.dat... ========== ========== ========== ========== ========== output_directory=../output/20-news/15Jun17-223315-fixvoc-D18600-K10-I10-B100-O186-t64-k0.6-at0.1-ae1.22546e-05-False-False/ input_directory=../input/20-news corpus_name=20-news dictionary_file=../input/20-news/voc.dat number_of_documents=18600 number_of_topics=10 snapshot_interval=10 batch_size=100 online_iterations=186 tau=64.0 kappa=0.6 alpha_theta=0.1 alpha_eta=1.22546016029e-05 hybrid_mode=False hash_oov_words=False ========== ========== ========== ========== ========== Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/ubgpu/github/InfVocLDA/src/fixvoc/launch.py", line 222, in
main()
File "/home/ubgpu/github/InfVocLDA/src/fixvoc/launch.py", line 189, in main
olda.export_beta(os.path.join(output_directory, 'exp_beta-0'), 50);
File "fixvoc/inferencer.py", line 89, in export_beta
freqdist.inc(word, self._exp_E_log_beta[k, self._vocab[word]]);
AttributeError: 'FreqDist' object has no attribute 'inc'
ubgpu@ubgpu:~/github/InfVocLDA/src$