hfxunlp / transformer

Neutron: A pytorch based implementation of Transformer and its variants.
https://github.com/hfxunlp/transformer
GNU General Public License v3.0
63 stars 9 forks source link

fix-some-bugs-in-LD #4

Closed shrango closed 1 year ago

shrango commented 1 year ago

Hi, I came across some errors when reproducing 'LD(Learning Source Phrase Representation for NMT)' approach. I examined and remedied the parameters in these functions and successfully run the model. 您好,我在复现LD方法的时候发现代码中存在一些参数传递的错误,这可能是兼容多个方法带来的问题。经过修改之后已经可以正常运行LD训练代码了。

hfxunlp commented 1 year ago

@shrango many thanks for pointing this out, I will fix this in the next commit.

shrango commented 1 year ago

Hi, a few days ago I pulled a request to fix the incompatibility in the training stage. However, I find incompatibilities in the inference stage too. 你好,我几天前提交版本的代码仅保证了训练阶段正常运行,但是测试阶段仍存在代码版本不兼容的问题

The newest pr(8c2b6eef192da319328cdce346e9f802b4272269) seems to be running. Yet I have no idea what the last parameter(True) that passed in means, in transformer/LD/Decoder.py line 173 and line 207: out, _state = net(inputu, inputhu, (None, None,), src_pad_mask, chk_pad_mask, None, out, True) 最新提交的版本已经能让测试阶段成功运行,但是这个版本中发现了一个冗余的参数,我不知道它有没有具体的意义。

As you see, the implementation of Decoderlayer.forward in line 36 takes only 8 parameters originally. Yet 9 parameters are passed in when calling the forward function. For fixing it, I just add an unknown_meaning parameter to enable the script to run smoothly. Decoderlayer的forward函数中只写了8个待传递参数,但是实际调用的时候却传入了9个,多了最后的一个True。我目前的改法是把函数定义里加了一个形参unknown_meaning,目前代码能正常运行。

However, the result turns out to be not ideal, which yields SacreBLEU of 25.4623 for WMT14 EN-DE task (using 'LD' setting but without Attentive phrase attention). So I am wondering if something important is not noticed, which is probably because of the unknown_meaning parameter. 但是测试结果不太理想,在WMT14 EN-DE任务上的SacreBLEU只有25.4623,使用的是LD的模型,但是没有加Attn设定。我猜我可能忽略了某些细节,比如那个不知道具体意义的参数。