vi3k6i5 / GuidedLDA

semi supervised guided topic model with custom guidedLDA
Mozilla Public License 2.0
497 stars 108 forks source link

Won't initialise prior distributions for seed words during .fit() method. #32

Closed DGMC90 closed 5 years ago

DGMC90 commented 5 years ago

I had been using this perfect for a few months until this morning. I'm now running into a problem when using the .fit method and passing:

When the model is initialised, it estimates the correct number of documents and words as indicated by the vectorised documents (145754 docs x 274185 vocabulary tokens).

I receive the following warning: ~\Continuum\anaconda3\lib\site-packages\guidedlda\utils.py:55: FutureWarning: Conversion of the second argument of issubdtype from int to np.signedinteger is deprecated. In future, it will be treated as np.int32 == np.dtype(int).type.

Then I get the following error code: model.fit(vectorisedJobs, seed_topics, seed_confidence) File "~\Continuum\anaconda3\lib\site-packages\guidedlda\guidedlda.py", line 131, in fit self._fit(X, seed_topics=seed_topics, seed_confidence=seed_confidence) File "~\Continuum\anaconda3\lib\site-packages\guidedlda\guidedlda.py", line 241, in _fit self._initialize(X, seed_topics, seed_confidence) File "~\Continuum\anaconda3\lib\site-packages\guidedlda\guidedlda.py", line 301, in _initialize if w not in seed_topics: TypeError: argument of type 'float' is not iterable

I'm not sure if the warning pointing to a deprecation on line 55 of utils.py during the utils.matrix_to_lists() method is the root of the problem I'm experiencing or not. I'm pretty sure the problem isn't on my side because this code was running perfectly not that long ago.

Thanks for any help in advance!

-D

DGMC90 commented 5 years ago

Not quite sure why this was the case but by removing parallel processing using the multiprocessing package the problem seems to have gone away. The problem persisted even when 1 worker was specified, so I'm not sure why this should have solved the problem.