Latest commit a671966 has the following elements important to highlight for troubleshooting
SGD (torch.optim.SGD) being used for retrofit loss (learning the M orthogonal matrix)
ELMo weights are still frozen; they can be manually changed by adjusting the RetrofitExperiment.__init__() and changing the model argument requires_grad from False to True.
There are extra metrics being logged.
pos_pair_dist_mean - mean of distance between target words used in a paraphrase context
neg_pair_dist_mean - mean of distance between target words used in a non-paraphrase context
[Retrofit] Loss is split into: Hinge_Loss & Orthogonalization_Loss
Pre_Clamp_Hinge_Loss is the Hinge_Loss before the loss.clamp(min=0) is applied
NOTE: Retrofit Loss still not behaving as intended.
Latest commit
a671966
has the following elements important to highlight for troubleshootingSGD
(torch.optim.SGD
) being used for retrofit loss (learning the M orthogonal matrix)RetrofitExperiment.__init__()
and changing themodel
argumentrequires_grad
fromFalse
toTrue
.pos_pair_dist_mean
- mean of distance between target words used in a paraphrase contextneg_pair_dist_mean
- mean of distance between target words used in a non-paraphrase contextLoss
is split into:Hinge_Loss
&Orthogonalization_Loss
Pre_Clamp_Hinge_Loss
is the Hinge_Loss before theloss.clamp(min=0)
is applied