Closed weitaizhang closed 2 years ago
Hi! Thanks for your question! Here is some advice that may be useful:
Can you provide more details about your implementations? so that we can provide more specific recommendations.
( By the way, we suggest using the architecture we propose in the paper as the NAT model.
@hemingkx thanks for your quick reply. When I say “my own AT and NAT models”, I mean I retrained the models with wmt14ende datasets. The AT model is transformer base, and the NAT model is trained using your "train.sh" script. I will read your codes carefully today. in the inference stage, I use "inference.sh" script. Thx.
Sure, thanks for the details! Did you use the sequence-level knowledge distillation to train the NAT drafter? You can run "pass_count.sh" script and it will provide more details about the mean accepted tokens and the average iteration.
for your information, here is my experimental results:
Thanks for your re-implementation of our work. As discussed in our paper, the translation results of vanilla GAD should be exactly the same as the AT verifier (greedy decoding, i.e., beam=1). Can you provide more details about your inference process (the performance of the AT verifier and the inference script)? Btw, we release our checkpoints here, which you can give a try ^_^
Sorry, I would like to revise my experiments results: the results of "GAD" means "GAD++" in your paper. and results of "vanilla GAD" is : bleu=26.65, decode time = 13'25
My results are as follows:
Great! It seems to work fine. Here are some suggestions:
Btw, have you tried the checkpoints we released? Maybe this can offer you some insights.
This issue was closed because it has been inactive for 30 days. If there are any other questions, please open a new issue or send me an email.
Hi, in your codes for GAD and GAD++, I have 2 questions. 1) GAD and GAD++ does not support setting batchsize>1 in inference.py 2) when I set batchsize=1, strategy="block" is much slower than strategy="fairseq". is there somethin wrong with my experiments? I run the inference.sh provided with my own AT and NAT models.