Closed williamstein closed 3 months ago
I can vouch for the data structures prior to:
pyLDAvis.enable_notebook()
panel = pyLDAvis.gensim.prepare(lda, corpus_lda, dictionary, mds='tsne')
The variable contents of everything prior are in agreement between cocalc and jupyter. I imagine it has something to do with the way the interactive objects are returned to be embedded in a notebook. I know very little about the software architecture of jupyter, unfortunately. Thanks.
One amusing side effect of trying this is that all of the icons on the page go away and the whole of cocalc is broken. So pyLDAvis, whatever it is doing, is in particular mangling the DOM somehow assuming it is running in Jupyter classic (probably)...
This "just works" now:
It even automatically uses an iframe so there is no css leakage...
I did do !pip install gensim pyLDAvis
first.
corpus=[] for i in range(len(tweets['Tweet'])): a=tweets['Tweet'][i] corpus.append(a) texts = [[word for word in str(document).lower().split()] for document in corpus]
from gensim import corpora dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(t) for t in texts]
from gensim import models tfidf = models.TfidfModel(corpus) # step 1 -- initialize a model corpus_tfidf = tfidf[corpus] # step 2 -- use the model to transform vectors
total_topics = 2 lda = models.LdaModel(corpus, id2word=dictionary, num_topics=total_topics) corpus_lda = lda[corpus_tfidf] # create a double wrapper over the original corpus: bow->tfidf->fold-in-lsi
import pyLDAvis import pyLDAvis.gensim pyLDAvis.enable_notebook() panel = pyLDAvis.gensim.prepare(lda, corpus_lda, dictionary, mds='tsne') panel
pyLDAvis.show(panel)