gregversteeg / corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Apache License 2.0
627 stars 120 forks source link

How to save and restore a corex topic model? #6

Closed cgreenberg closed 7 years ago

cgreenberg commented 7 years ago

For running a model that takes a long time, it would be useful to save the model to be read in again later for additional hierarchical modeling. Is there a suggested way to save the model? This would need to save the weights and the assignments, I suppose, but how do you restore? Thanks.

ryanjgallagher commented 7 years ago

Hi Charles, You should be able to save a CorEx object using pickle or cPickle.

import cPickle cPickle.dump(corex_object, open('filename.pkl', 'wb'))

The object then can also be restored using pickle.

corex_object = cPickle.load(open('filename.pkl', 'rb'))

If you have initialized words and it has Unicode in it, then you might run into an issue trying to restore the CorEx object. This is a bug we are working on fixing currently. A workaround is to not initialize words and make your own mapping between words and columns of the input matrix.

cgreenberg commented 7 years ago

Thanks! That worked!

sarveshj commented 6 years ago

If you are using python 3.x, use import _pickle as cPickle instead of import cPickle