Since we no longer want to instantiate multiple Trainer instances.
Also fixes a device-related bug in the process (see last commit):
This was covered up by previous usage where we'd instantiate trainers for each transformer (semantic/coarse/fine). Now that we have to be able to load transformers without their corresponding trainers, because we can only load one accelerator at a time, the code in trainer that checks wrapper device no longer applies and we get device mismatches. This commit fixes that
--
I tested this code and was able to get a small training run of a few hundred steps working so it works e2e!
Since we no longer want to instantiate multiple Trainer instances.
Also fixes a device-related bug in the process (see last commit):
--
I tested this code and was able to get a small training run of a few hundred steps working so it works e2e!