gregversteeg / corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Apache License 2.0
627 stars 120 forks source link

Update for functionality with word and doc labels. Update to example notebook #10

Closed ryanjgallagher closed 6 years ago

ryanjgallagher commented 6 years ago
  1. Made the Python example notebook Python 3 compatible and made the examples a little more interesting. Also made sure to set seeds so it should be directly reproducible
  2. Added functionality so you can add labels to the doc-term matrix columns (terms) more easily after having trained a CorEx model.
  3. Added a new attribute to set labels for the rows (docs), including the ability to set the doc labels after training the CorEx topic model.
  4. Added a "get_top_docs" function which returns documents sorted according to probability or TC. I put a warning under TC because we're still trying to figure out the right way to think about it.

If someone could check to make sure I'm sorting documents correctly, that'd be great. I think everything else should be in order.

gregversteeg commented 6 years ago

Thanks! The sorting looks good to me.