Contrastive loss details

kandorm / CLINE

Lexically Error Correction BERT.

49 stars 4 forks source link

Contrastive loss details #4

Open bernaljg opened 2 years ago

bernaljg commented 2 years ago

Hi! I found something a bit confusing that you might be able to clarify. For the contrastive loss implemented in "LecbertForPreTraining" class, you use a sigmoid as the activation function before applying the BCE loss but in the paper you mention a normal InfoNCE loss (with softmax activations) was used. Am I misunderstanding something perhaps?

HarBatt commented 1 year ago

Yeah, the author used a simple BCE loss to maximize similarity. Weird, it's different from the paper.