x-tabdeveloping / topicwizard

Powerful topic model visualization in Python
https://x-tabdeveloping.github.io/topicwizard/
MIT License
96 stars 13 forks source link

Implement Words on Topic Axes Figure #33

Closed x-tabdeveloping closed 6 months ago

x-tabdeveloping commented 6 months ago

Rationale:

It is useful to be able to understand topics as axes, especially with models like Semantic Signal Separation, that conceptualize topics as such. While you can easily get an overview of most important words for a topic in topicwizard, it could be very useful to also see words that are the most negative for a given topic, or are neutral.

Solution:

Add a plot to the figures API, where users can display a word map according to two chosen topic axes, opposed to the word_map() function, where the axes are calculated using UMAP.

image

Implementation:

Some code has already been written for this in my Medium article about S3:

import numpy as np

vocab = model.get_vocab()

# We will produce a BoW matrix to extract term frequencies
document_term_matrix = model.vectorizer.transform(ds["abstract"])
frequencies = document_term_matrix.sum(axis=0)
frequencies = np.squeeze(np.asarray(frequencies))

import pandas as pd

# model.components_ is a n_topics x n_terms matrix
# It contains the strength of all components for each word.
# Here we are selecting components for the words we selected earlier

terms_with_axes = pd.DataFrame({
    "inference": model.components_[7][selected_terms],
    "measurement_devices": model.components_[1][selected_terms],
    "noise": model.components_[6][selected_terms],
    "term": vocab[selected_terms]
 })

import plotly.express as px

px.scatter(
    terms_with_axes,
    text="term",
    x="inference",
    y="noise",
    color="measurement_devices",
    template="plotly_white",
    color_continuous_scale="Bluered",
).update_layout(
    width=1200,
    height=800
).update_traces(
    textposition="top center",
    marker=dict(size=12, line=dict(width=2, color="white"))
)
x-tabdeveloping commented 6 months ago

Added to figures.word_map() in version 1.0.1.