Open ferreirafabio opened 3 months ago
@ferreirafabio Hey Fabio, assuming the teacher is a large pretrained model, the optimizer would be required only for the student model. The following snippet depicts the usage of ipex,
teacher.eval()
student.train()
optimizer = ...
teacher = ipex.optimize(model=teacher, dtype=torch.float32)
student, optimizer = ipex.optimize(model=student, dtype=torch.float32, optimizer=optimizer)
...iterate over train data...
with torch.no_grad():
teacher_probs = softmax(teacher(train_data))
student_probs = softmax(student(train_data))
... compute combined loss ...
..backprop..
..optimizer update..
Other things to consider:
@vishnumadhu365 thank you for your reply and code example. I tried that but get a
AssertionError: The optimizer should be given for training mode
since the teacher is still in train mode since I do not execute .eval()
mode on it. AFAIK, I cannot put it in eval mode because I do still want the Batch Norm statistics from the training mode. The precise application I'm using is training DINO. see here for the exact usage of teacher/student:
What can be done in such a case?
Describe the issue
Hi, I'm wondering how to use
ipex.optimize(...)
when I have two models, for example, teacher and student in model distillation but only one optimizer. Would calls like the following work? Note that teacher and student are both in train mode but teacher parameters are set torequires_grad=False
, and the optimizer only operates over the student parameters.I'm unaware of the intricacies this way of calling it may entail, and would like to get a waiver indicating that this is something okay to do. Thank you.