It is useful to be able to understand topics as axes, especially with models like Semantic Signal Separation, that conceptualize topics as such.
While you can easily get an overview of most important words for a topic in topicwizard, it could be very useful to also see words that are the most negative for a given topic, or are neutral.
Solution:
Add a plot to the figures API, where users can display a word map according to two chosen topic axes, opposed to the word_map() function, where the axes are calculated using UMAP.
Implementation:
Some code has already been written for this in my Medium article about S3:
import numpy as np
vocab = model.get_vocab()
# We will produce a BoW matrix to extract term frequencies
document_term_matrix = model.vectorizer.transform(ds["abstract"])
frequencies = document_term_matrix.sum(axis=0)
frequencies = np.squeeze(np.asarray(frequencies))
import pandas as pd
# model.components_ is a n_topics x n_terms matrix
# It contains the strength of all components for each word.
# Here we are selecting components for the words we selected earlier
terms_with_axes = pd.DataFrame({
"inference": model.components_[7][selected_terms],
"measurement_devices": model.components_[1][selected_terms],
"noise": model.components_[6][selected_terms],
"term": vocab[selected_terms]
})
import plotly.express as px
px.scatter(
terms_with_axes,
text="term",
x="inference",
y="noise",
color="measurement_devices",
template="plotly_white",
color_continuous_scale="Bluered",
).update_layout(
width=1200,
height=800
).update_traces(
textposition="top center",
marker=dict(size=12, line=dict(width=2, color="white"))
)
Rationale:
It is useful to be able to understand topics as axes, especially with models like Semantic Signal Separation, that conceptualize topics as such. While you can easily get an overview of most important words for a topic in topicwizard, it could be very useful to also see words that are the most negative for a given topic, or are neutral.
Solution:
Add a plot to the figures API, where users can display a word map according to two chosen topic axes, opposed to the
word_map()
function, where the axes are calculated using UMAP.Implementation:
Some code has already been written for this in my Medium article about S3: