Train baseline models for evaluation

huu4ontocord / MDEL

Multi-Domain Expert Learning

Apache License 2.0

67 stars 14 forks source link

Train baseline models for evaluation #42

Open huu4ontocord opened 1 year ago

huu4ontocord commented 1 year ago

We need to eval the experts that are merged against if we trained a 1b Pythia model all together.

Trained with all layers on the 6 datasets we have.
Trained with just the upper layers.

To keep it fair, we would need to get the exact same 8000 random train example each for the 7 dataset we used in the ohter experiments. And we merge the 6 experts with basic averaging and run the same eval from the 7 dataset on that model.

This will give us a comparison of :

training all layers on same token and data
training some layers on same token and data
merging with different experts trained on same compute

mrseeker commented 1 year ago

Have you tried using the EleutherAI eval harness? It should give you a nice representation on how well the model performs, and can be used as an indicator?

mrcabbage972 commented 1 year ago

I didn't understand the part about the 1000 training examples. Our datasets are much bigger than that!

huu4ontocord commented 1 year ago

didn't we just train our models on 1000 examples only? Or did i misunderstand that

huu4ontocord commented 1 year ago

We definately should try on eleuther eval harness. but just testing validity loss will tell us something too. regular finetuing vs. expert finetuning + merge

mrcabbage972 commented 1 year ago

We have an issue for Eval Harness in the backlog.

huu4ontocord commented 1 year ago

so i am told that: It seems they were trained on 1k batches I think the batch size was 8 because of the number of GPUs So that gives uhs 8k samples

So the above 1000 examples should be 8K examples.

jordiclive commented 1 year ago

@ontocord for 2. we want layer_9,10,11,12,13 ?

mrcabbage972 commented 1 year ago

@jordiclive @ontocord We had used layers 9-13 when we trained the experts. See: https://github.com/ontocord/MDEL/blob/main/src/mdel/train.sh#L4

mrcabbage972 commented 1 year ago

@jordiclive Any updates on this issue?

jordiclive commented 1 year ago

@mrcabbage972 I trained 1. a model (all layers) on the exact splits...https://wandb.ai/ontocord/jordi_testing/runs/hu8j9ta1?workspace=user-jordanclive if you toggle the evaluation.

But I then thought we decided on automating the experiment again with more training data/less validation, maybe same amount of final testing data #47