Works quite well on English, actually, even without splitting the optimizer or implementing any form of scheduling.
With no finetuning, adding electra-large to the 3 class English dataset (SST plus a few other pieces) gets 70 Macro F1.
The base finetuning gets between 74-75 macro F1 on sstplus, but frequently fails to successfully train, getting somewhere around 60 F1
Training with PEFT gets in the 74-75 F1 range each time, with no failures observed so far.
Adds a chunk of test to the sentiment training which starts the Pipeline with a peft-trained model
Also included is adding a uses-charlm flag to the config, so that inadvertently passing a charlm (such as via Pipeline) to the sentiment model doesn't blow up if it was trained w/o a charlm
Add a PEFT wrapper for the Sentiment training.
Works quite well on English, actually, even without splitting the optimizer or implementing any form of scheduling. With no finetuning, adding electra-large to the 3 class English dataset (SST plus a few other pieces) gets 70 Macro F1. The base finetuning gets between 74-75 macro F1 on sstplus, but frequently fails to successfully train, getting somewhere around 60 F1 Training with PEFT gets in the 74-75 F1 range each time, with no failures observed so far.
Adds a chunk of test to the sentiment training which starts the Pipeline with a peft-trained model
Also included is adding a uses-charlm flag to the config, so that inadvertently passing a charlm (such as via Pipeline) to the sentiment model doesn't blow up if it was trained w/o a charlm