Closed mpiorczynski closed 9 months ago
It seems to be a bug to me. The correct way to define existing layers should be existing_layers = (head_layer_z != 0) | (mlp_z != 0)
. We mostly used version 3 during our development and didn't test version 4 much. Thanks for pointing this out!
Thanks a lot!
Hi! I have a question, why are you checking if the MHA layer is not pruned in this line, unlike here where you check for the FFN layer? Since distillation happens at the outputs of FFN layers, shouldn't the check be for the presence of FFN instead of MHA? Is this intentional or potentially a bug?