Say I finetune a model with your script without any pruning. Then running the evaluation.py script seems to give accuracy matching the final accuracy of the finetuned model during training (so far so good).
However, when I attempt to train a new pruned model, starting with that finetuned model, the accuracy of the first evaluation in the trainer.py script seems to be much lower, e.g. 32%. Why is that the case? shouldn't the initial evaluation in the trainer.py match the evaluation of "evaluation.py"?
EDIT: I think I have figured out what's happening: The first evaluation corresponds to a bert base model with untrained classification heads for the task of choice. Can you verify this?
If you try to load a finetuned model, I think the model should be loaded properly via line186 and line199, so it should given the matching validation accuracy in the first evaluation?
Hello,
Say I finetune a model with your script without any pruning. Then running the evaluation.py script seems to give accuracy matching the final accuracy of the finetuned model during training (so far so good).
However, when I attempt to train a new pruned model, starting with that finetuned model, the accuracy of the first evaluation in the trainer.py script seems to be much lower, e.g. 32%. Why is that the case? shouldn't the initial evaluation in the trainer.py match the evaluation of "evaluation.py"?
EDIT: I think I have figured out what's happening: The first evaluation corresponds to a bert base model with untrained classification heads for the task of choice. Can you verify this?