Closed hobbitlzy closed 1 year ago
Hello @hobbitlzy , sorry for that distiller only support link with single output layers, we will add support for multi-output in the future, for a workaround, you could add an torch.nn.Identity
at the end of the encoder before return the multi-output,
class Encoder(...):
def __init__(...):
...
self.idt = torch.nn.Identity()
def forward(...):
...
layer_outputs = self.idt(layer_outputs)
return layer_outputs, second_output
Then modify your config_list
:
config_list = [{
'op_names': [f'bert.encoder.layer.{i}.idt'], # add idt
'link': [f'bert.encoder.layer.{j}.idt' for j in range(i, layer_num, # add idt
'lambda': 0.9,
'apply_method': 'mse',
} for i in range(layer_num)]
Thanks, that solves the problem. :)
Hi @J-shang. One more comment, I find if I pack the output as a tuple output=(layer_outputs, second_output)
, the distiller seems to pick the output[0]
as the distillation target, which I checked in this method. This happens to work for me since I only need the layer_outputs
for distillation. But I do not dive to understand why distiller does this and dangers may grow somewhere.
Describe the issue: I am using the transcript of pruning BERT on MNLI. I find a problem of distillation when I change the output format of the encoder, which is as follows.
I changed the configuration of the distiller
But the errors I put in the Log message occurs. Could you help me with this problem?
Environment:
Log message: