First paper experiment plans

mmatena commented 3 years ago

The idea is that I'll add a comment with a motivation and description for experiments I plan to do. I'll keep adding details to the comment as the experiment gets more flushed out. When I actually run the experiment, I'll add details and results of what I actually did to issue #13.

mmatena commented 3 years ago

Impact of fine-tuning steps on merging performance

Considerations

Target model vs donor model.
Need a target-donor pair where we see a robust boost.
High resource task vs low resource task

Summary

Given a target-donor pair with a robust boost, we computer the fishers and merge for all 8 x 8 pairs of checkpoints from our fine-tuning.
- For effiency reasons, it may be reasonable to pick a single target checkpoint and try multiple donor checkpoints.

mmatena commented 3 years ago

Impact of number of examples used to compute the fine-tuned Fisher

Considerations

Target model vs donor model.
Need a target-donor pair where we see a robust boost.
High resource task vs low resource task

mmatena commented 3 years ago

"Catastrophic remembering"

Add more details, but the idea is say we are given model 1, which is trained on task A. Then we train model 1 on task B to get model 2, which catastrophically forgets task A. We merge model 1 with model 2 to see if we can do well on both tasks. Traditional EWC would be a baseline.

mmatena commented 3 years ago

Impact of number of examples and y_samples used to compute the pretrained (e.g. MLM) Fisher

Considerations

Impacts on fine-tuning, merging of fine-tuned models, and directly merging with the pretrained model.

mmatena commented 3 years ago

Automatic task weighting selection

TODO

mmatena commented 3 years ago

Compare robust Fisher computation from Task2Vec to direct computation

TODO

mmatena commented 3 years ago

Asynchronous distributed learning

Add more details, but the idea is that we partition the train set into N disjoint sets. Then we train models on each partition separately and merge them to cheaply join their work. There could be multiple steps of train then merge. Using the Empirical Fisher and computing it online during training would make this more efficient.

This federated learning paper might be relevant.

mmatena commented 3 years ago

Layer-specific merging proportions

TODO

mmatena / m251