draft retrofit loss - Githubissues

SGD (torch.optim.SGD) being used for retrofit loss (learning the M orthogonal matrix)
ELMo weights are still frozen; they can be manually changed by adjusting the RetrofitExperiment.__init__() and changing the model argument requires_grad from False to True.
There are extra metrics being logged.
- pos_pair_dist_mean - mean of distance between target words used in a paraphrase context
- neg_pair_dist_mean - mean of distance between target words used in a non-paraphrase context
- [Retrofit] Loss is split into: Hinge_Loss & Orthogonalization_Loss
- Pre_Clamp_Hinge_Loss is the Hinge_Loss before the loss.clamp(min=0) is applied

jscuds / rf-bert