Hard guidance lower in the network and soft guidance higher

iejMac / encoder-distill

Align embedding spaces of PyTorch encoders with common input types.

MIT License

4 stars 0 forks source link

Hard guidance lower in the network and soft guidance higher #16

Open iejMac opened 2 years ago

iejMac commented 2 years ago

Maybe part of the problem of continuing from the distilled checkpoint is because the final task doesn't allow for a lot of deviation from L weights. To allow for more freedom we can do an alignment style distillation loss in the lower layers of the network and only do MSE(similarities) at the final layer.