melaniewalsh / Intro-Cultural-Analytics

Introduction to Cultural Analytics & Python, course website and online textbook powered by Jupyter Book
https://melaniewalsh.github.io/Intro-Cultural-Analytics
GNU General Public License v3.0
260 stars 89 forks source link

Possible TF-IDF results visualization #12

Closed emonson closed 3 years ago

emonson commented 3 years ago

Hey Melanie,

Just in case you're interested, at the end of your TF-IDF-Scikit-Learn lesson I created a visualization of the TF-IDF results for the presidential speeches, and I think it's kind of a fun way to look at the results. It uses the Altair visualization module, which can be easily installed with conda install altair or pip install altair. Below is the code if you are curious and want to try it out.

Best, -Eric

import altair as alt
import numpy as np

# adding a little randomness to break ties in term ranking
top_tfidf_plusRand = top_tfidf.copy()
top_tfidf_plusRand['tfidf'] = top_tfidf_plusRand['tfidf'] + np.random.rand(top_tfidf.shape[0])*0.0001

# base for all visualizations, with rank calculation
base = alt.Chart(top_tfidf_plusRand).encode(
    x = 'rank:O',
    y = 'document:N'
).transform_window(
    rank = "rank()",
    sort = [alt.SortField("tfidf", order="descending")],
    groupby = ["document"],
)

# heatmap specification
heatmap = base.mark_rect().encode(
    color = 'tfidf:Q'
)

# terms in this list will get a red dot in the visualization
term_list = ['nation','national','republic','union']

# red circle over terms in above list
circle = base.mark_circle(size=100).encode(
    color = alt.condition(
        alt.FieldOneOfPredicate(field='term', oneOf=term_list),
        alt.value('red'),
        alt.value('#FFFFFF00')        
    )
)

# text labels, white for darker heatmap colors
text = base.mark_text(baseline='middle').encode(
    text = 'term:N',
    color = alt.condition(alt.datum.tfidf >= 0.23, alt.value('white'), alt.value('black'))
)

# display the three superimposed visualizations
(heatmap + circle + text).properties(width = 600)

visualization (4)

melaniewalsh commented 3 years ago

Oh, wow! I love this. Thanks so much, @emonson. Do you mind if I include this code in my textbook and cite you?

melaniewalsh commented 3 years ago

@emonson Your suggestion has been incorporated into the book: https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/03-TF-IDF-Scikit-Learn.html#visualize-tf-idf Thank you again!