Aligned Neural Topic Model (ANTM) for Exploring Evolving Topics: a dynamic neural topic model that uses document embeddings (data2vec) to compute clusters of semantically similar documents at different periods, and aligns document clusters to represent topic evolution.
MIT License
I can not run the example code in Colab #2

Closed hxtruong6 closed 1 year ago

hxtruong6 commented 1 year ago

I try to run code in colab but I got some issue like this:

contextual document embedding is initiated...
Pandas Apply: 100%
2000/2000 [23:34<00:00, 1.27it/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (549 > 512). Running this sequence through the model will result in indexing errors
Summarizing a document with BART due to its Large length for Embedding...
Sliding Window Segmentation is initialized...
Aligned Dimension Reduction is initialized...
Sequential Document-cluster association is initialized...
Cluster Alignment Procedure is initialized...
I run code in readme which is:

from antm import ANTM
import pandas as pd

# load data

# choosing the windows size and overlapping length for time frames
window_size = 6
overlap = 2

#initialize model
model=ANTM(df,overlap,window_size,umap_n_neighbors=10, partioned_clusttering_size=5,mode="data2vec",num_words=10,path="./saved_data")

#learn the model and save it
topics_per_period=model.fit(save=True)    # <------- ERROR when save model.
#output is a list of timeframes including all the topics associated with that period
hamedR96 commented 1 year ago

As the error says, you need to install "nltk.download('punkt')"

Do this in a cell before:

import nltk nltk.download('punkt')

Note that the code runs very slower on Colab comparing to a local machine.