Direct cross-entropy loss (applied on language model)

zomux / lanmt-ebm

lanmt ebm

11 stars 1 forks source link

Direct cross-entropy loss (applied on language model) #11

Open zomux opened 4 years ago

zomux commented 4 years ago

Idea:

directly maximize the cross entropy of token prediction based on the refined vectors

refinement using EBM

zomux commented 4 years ago

Training

abcirun python lanmt/lm.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --train

zomux commented 4 years ago

Problem

After refine the vectors for many times, the resultant sentence is still not meaningful
<s> Gut@@ ach : Noch ach Sicherheit . . ger . .

zomux commented 4 years ago

[x] Using token corruption for noise generation rather than Gaussian noise

zomux commented 4 years ago

Variations

[x] one loss
[ ] dual loss (after refined + before refined )
[x] train for only noised tokens / train for all tokens
[x] train a new encoder / use the encoder the LANMT

zomux commented 4 years ago

[ ] Seems promising, try it on MT?

zomux commented 4 years ago

[x] debug the model by falling back to bert

zomux commented 4 years ago

[x] implement the trainable encoder

zomux commented 4 years ago

Problem

with standard cross entropy loss, the model just learns to copy
try to adjust the weight in loss, by loss = copy loss + noise correction loss

zomux commented 4 years ago

Experiment manager: http://trains.deeplearn.org:8080/projects/ba0f0c0c3a7f444486cb7204f998b3a8/experiments?columns=name&columns=status&columns=m.1d393a6675f5bf9b5f39e624b02faebb.b3904c3e52bbf517284557af8e79e99d.max_value.nmtlab.valid_acc&columns=m.1d393a6675f5bf9b5f39e624b02faebb.c60a431c9386cab66abc5ec5bd23d11d.max_value.nmtlab.valid_noise_acc&order=last_update&filter=status:completed%2Bin_progress

zomux commented 4 years ago

qrsh -g gcb50249 -l rt_F=2 $HOME/research/abcirun.sh python lanmt/lm.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 4096 --opt_distill --opt_modeltype realgrad --opt_nrefine 1 --train