Closed ZihaoZheng98 closed 1 year ago
what if there are two model with different input/output form. Does this technique still work?
Regarding the first question: yes, but this technique only works this way when you have a strong pre-trained model.
Regarding the second question: in my work, I merge vision and language architectures. They have different embedding layers so I didn't merge them. I only merge the transformer layers because their configuration is the same. I found in this case I need to do fine-tuning or the performance will be bad.
Thanks. This work brings new insights to me.
No problem!
Hi, model merging is a new term for me. Model merging is a technique, which can achieve multi-task objective while do not need to conduct multi-task training. Is that correct?