sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.17k stars 216 forks source link

support pyLDAvis in cocalc-jupyter #4983

Closed williamstein closed 3 months ago

williamstein commented 3 years ago
  1. This input in the Python 3 system wide kernel in Jupyter:
    
    import pandas as pd
    tweets = pd.DataFrame(columns=['ID','Tweet'])
    tweets.loc[0]=[0, "republican trump"]
    tweets.loc[1]=[1, "republican pence"]
    tweets.loc[2]=[2, "pence trump"]
    tweets.loc[3]=[3, "republican trump pence"]
    tweets.loc[4]=[4, "democrat biden"]
    tweets.loc[5]=[5, "democrat harris"]
    tweets.loc[6]=[6, "harris biden"]
    tweets.loc[7]=[7, "democrat biden harris"]

corpus=[] for i in range(len(tweets['Tweet'])): a=tweets['Tweet'][i] corpus.append(a) texts = [[word for word in str(document).lower().split()] for document in corpus]

from gensim import corpora dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(t) for t in texts]

from gensim import models tfidf = models.TfidfModel(corpus) # step 1 -- initialize a model corpus_tfidf = tfidf[corpus] # step 2 -- use the model to transform vectors

total_topics = 2 lda = models.LdaModel(corpus, id2word=dictionary, num_topics=total_topics) corpus_lda = lda[corpus_tfidf] # create a double wrapper over the original corpus: bow->tfidf->fold-in-lsi

import pyLDAvis import pyLDAvis.gensim pyLDAvis.enable_notebook() panel = pyLDAvis.gensim.prepare(lda, corpus_lda, dictionary, mds='tsne') panel

2. No output.

There should be output.  Try this for more info and help:

pyLDAvis.show(panel)



This is probably one of those things that might be very difficult with cocalc-jupyter.  It's impossible to know without diving deep into how it actually works.

WORKAROUND: Use Jupyter classic or Juptyer lab.

REQUESTED BY:  B K
grouptheory commented 3 years ago

I can vouch for the data structures prior to:

pyLDAvis.enable_notebook()
panel = pyLDAvis.gensim.prepare(lda, corpus_lda, dictionary, mds='tsne')

The variable contents of everything prior are in agreement between cocalc and jupyter.  I imagine it has something to do with the way the interactive objects are returned to be embedded in a notebook. I know very little about the software architecture of jupyter, unfortunately. Thanks.

williamstein commented 3 years ago

One amusing side effect of trying this is that all of the icons on the page go away and the whole of cocalc is broken. So pyLDAvis, whatever it is doing, is in particular mangling the DOM somehow assuming it is running in Jupyter classic (probably)...

image

williamstein commented 3 months ago

This "just works" now:

image

It even automatically uses an iframe so there is no css leakage...

I did do !pip install gensim pyLDAvis first.