Closed miguelgondu closed 2 months ago
Trained two models with the same seed (stored in two different folders seed_1_example1
and seed_1_example2
) and tested them with the following script:
from pathlib import Path
from notebooks.utils.topic import Topic
from notebooks.utils.corpus import Corpus
from notebooks.utils.model import Model
NOTEBOOKS_DIR = Path(__file__).parent / "notebooks"
# N_TOPICS = 10
# base_model = Model(Corpus(registry_path="../utils/article_registry.json"), N_TOPICS)
corpus = Corpus(registry_path=NOTEBOOKS_DIR / "utils" / "article_registry.json")
n_topics = 90
seed = 1
model_1 = Model(corpus, n_topics, seed=seed)
model_2 = Model(corpus, n_topics, seed=seed)
model_1.path = NOTEBOOKS_DIR / "models" / "seed_1_example1"
model_2.path = NOTEBOOKS_DIR / "models" / "seed_1_example2"
model_1.load_topics(num_workers=1, load_from_cache=False, save_cache=False)
model_2.load_topics(num_workers=1, load_from_cache=False, save_cache=False)
Indeed, the first two topics of both were exactly the same.
We should be getting the same model if we run the training process again with the same seed.