iejMac / encoder-distill

Align embedding spaces of PyTorch encoders with common input types.
MIT License
4 stars 0 forks source link

Try to use similarity as a loss #14

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

Mse (sim(new clip image, new clip text), sim(original clip image), sim (original clip text))

Could be completely instead of alignment or in addition to

iejMac commented 2 years ago

Pros:

Cons:

Perhaps this is something in between contrastive from scratch and our standard distillation we're doing. This is like teacher guided contrastive learning. But in that case this makes more sense to me:

total_loss = contrastive_loss(student_embeddings) + MSE(student_embedding, teacher embedding)

That way if the H can find better solutions than the L the contrastive_loss(H) part will push it toward using it's independent solutions vs. using the teacher solutions in the MSE loss.

@rom1504

iejMac commented 2 years ago

Also in the proposed method above you'd probably like to decay the influence of the teacher over time, so actually it would be like:

total_loss = (1-gamma) contrastive_loss(student_embeddings) + gamma MSE(student_embedding, teacher embedding)

iejMac commented 2 years ago

Potential idea: instead of MSE between similarities just do contrastive loss but instead of identity matrix as label do matrix of teacher scores