对这个work有一个疑惑：

scofield7419 commented 6 years ago

我在研究您的论文时，产生了一个疑惑：你的模型/方法破坏了training set & testing set的原始分布。

其他的RL工作都是基于改变模型参数来适配拟合数据的，也就是不会改变training data & testing data。这样就保证了training set & testing set的原始分布。

但是这篇文章的工作核心是：用RL来对原始training数据的noise bag进行剔除，通过标签Y改变input data。这在training阶段是OK的，这样做确实可以减少noise data对我的分类模型的干扰。但是在test阶段还能这样吗？test set都没label了，如何反馈reward给policy module进行test set中的bag的剔除？那么我在test phrase还如何work呢？

我看了代码，发现in test phrase，确实是直接对test set用CNN做关系分类。

谢谢。

xuyanfu commented 6 years ago

您好，我只是用Tensoflow复现了原始作者的工作。我认为原始作者的思想是利用强化学习在训练阶段选择更好的数据进行训练，这样得到的模型在测试阶段会有更好的效果。数据筛选的过程只是发生训练阶段，测试阶段不需要。

ghost commented 5 years ago

这个问题作者在文中有提到，以下是文章部分段落 Evaluation settings. We predicted a relation label for each sentence, instead of for each bag. For example, the task in Figure 1 needs to map the first sentence to relation “BornIn” and the second sentence to “EmployedBy”. Since the data obtained from distant supervision are noisy, we randomly chose 300 sentences and manually labeled the relation type for each sentence to evaluate the classification performance. We adopted accuracy and macro-averaged F1 as the evaluation metric.

xuyanfu / TensorFlow_RLRE

对这个work有一个疑惑： #19