huu4ontocord / MDEL

Multi-Domain Expert Learning
Apache License 2.0
67 stars 14 forks source link

Investigate Expert Models Having High Perplexity #57

Open mrcabbage972 opened 1 year ago

mrcabbage972 commented 1 year ago

Our analysis in #53 has shown that the expert models we had previously trained actually have a higher perplexity than the base model.

Here are some issues that may have caused this:

The expert models were trained with an old version of the trainer, so we don't know which wandb run they belong to and what were the pile/domain data losses during the training. Re-doing the training of one of the experts should help clarify.

Stillerman commented 1 year ago

Further investigation - train.py has a --do-eval option that also computes the perplexity. After running both the base model and the arxiv model through this on the arxiv dataset, I find the same discrepancy as in the dedicated perplexity script. This rules out any concern I had about if it was just a different data/tokenization pipeline in the perplexity script vs the train.