Question about model initialization

Does reference model, proxy model and main model have to be initialized with the same method? When continue pretraining LlaMA2 with doremi, the weights of the main model are initialized from the meta checkpoint. But for the reference model and procy model, there are not such checkpoints. Instead, these models are initialized with other methods(e.g. Xavier initialization). In this scenario, will the doamin weights of the procy model still improve the performance of the main model?

sangmichaelxie / doremi

Question about model initialization #30