juanrloaiza / latinamerican-philosophy-mining

Text mining philosophy journals in Latin America.
0 stars 2 forks source link

Verify that our results are replicable #27

Closed miguelgondu closed 2 months ago

miguelgondu commented 5 months ago

We should be getting the same model if we run the training process again with the same seed.

miguelgondu commented 2 months ago

Trained two models with the same seed (stored in two different folders seed_1_example1 and seed_1_example2) and tested them with the following script:

from pathlib import Path

from notebooks.utils.topic import Topic
from notebooks.utils.corpus import Corpus

from notebooks.utils.model import Model

NOTEBOOKS_DIR = Path(__file__).parent / "notebooks"

# N_TOPICS = 10
# base_model = Model(Corpus(registry_path="../utils/article_registry.json"), N_TOPICS)
corpus = Corpus(registry_path=NOTEBOOKS_DIR / "utils" / "article_registry.json")

n_topics = 90
seed = 1

model_1 = Model(corpus, n_topics, seed=seed)
model_2 = Model(corpus, n_topics, seed=seed)

model_1.path = NOTEBOOKS_DIR / "models" / "seed_1_example1"
model_2.path = NOTEBOOKS_DIR / "models" / "seed_1_example2"

model_1.load_topics(num_workers=1, load_from_cache=False, save_cache=False)
model_2.load_topics(num_workers=1, load_from_cache=False, save_cache=False)

Indeed, the first two topics of both were exactly the same.