Open SmildWind opened 2 weeks ago
您好!非常棒的工作!我们在复现这个工作的时候,code正常运行了,但是在models/rule_detection.py文件中的class RFND定义代码里的E_ = torch.stack(E_, dim=0).contiguous() * values中出现了大量的NaN,进而导致了损失全部为NaN,具体原因似乎是因为torch.stack(E_, dim=0).contiguous()中存在0,而values中存在-inf,0*inf=NaN,我们尝试使用如下代码试图替换NaN:
models/rule_detection.py
class RFND
E_ = torch.stack(E_, dim=0).contiguous() * values
torch.stack(E_, dim=0).contiguous()
0
values
-inf
0*inf=NaN
E_stacked = torch.stack(E_, dim=0).contiguous() # 将 values 扩展成与 E_stacked 相同的形状 values_expanded = values.unsqueeze(-1) zero_mask = E_stacked == 0 inf_mask = values_expanded == -float('inf') # 使用0代替-inf以避免NaN values_expanded[inf_mask] = 0 result = E_stacked * values_expanded E_ = result
目前不知道为何会存在0*inf=NaN,数据集我们用的Twitter_Set, 设置text_path='dataset/twitter/texts/twitter_final2.json'以及img_path=dataset/twitter/embedding.pt
text_path='dataset/twitter/texts/twitter_final2.json'
img_path=dataset/twitter/embedding.pt
除了使用 values_expanded[inf_mask] = 0 以外,我们还尝试了 values_expanded[inf_mask] = -3276800以近似-inf, 结果离论文中的0.911还是差距比较大,目前不知道具体原因,如果我们的实现有哪里有问题还烦请您指出!
values_expanded[inf_mask] = 0
values_expanded[inf_mask] = -3276800
训练结果如下:
# 训练结果 values_expanded[inf_mask] = 0 Train Epoch 0: Time 472.6773, Acc: 0.8900, Loss: 1.4030, Rumor_R: 0.8729, Rumor_P: 0.9233, Rumor_F: 0.8974, Non_Rumor_R: 0.9110, Non_Rumor_P: 0.8537, Non_Rumor_F1: 0.8814 Test: Time: 17.2103, Acc: 0.3280, Loss: 0.7011, Rumor_R: 0.6040, Rumor_P: 0.2672, Rumor_F: 0.3705, Non_Rumor_R: 0.1936, Non_Rumor_P: 0.5011, Non_Rumor_F1: 0.2793 Train Epoch 1: Time 497.7716, Acc: 0.9885, Loss: 1.0146, Rumor_R: 0.9863, Rumor_P: 0.9928, Rumor_F: 0.9895, Non_Rumor_R: 0.9912, Non_Rumor_P: 0.9833, Non_Rumor_F1: 0.9872 Test: Time: 14.9006, Acc: 0.3309, Loss: 0.7000, Rumor_R: 0.6113, Rumor_P: 0.2697, Rumor_F: 0.3743, Non_Rumor_R: 0.1945, Non_Rumor_P: 0.5069, Non_Rumor_F1: 0.2811 Train Epoch 2: Time 519.4532, Acc: 0.9884, Loss: 1.0179, Rumor_R: 0.9861, Rumor_P: 0.9928, Rumor_F: 0.9894, Non_Rumor_R: 0.9912, Non_Rumor_P: 0.9831, Non_Rumor_F1: 0.9871 Test: Time: 15.9966, Acc: 0.3202, Loss: 0.6990, Rumor_R: 0.5055, Rumor_P: 0.2421, Rumor_F: 0.3274, Non_Rumor_R: 0.2300, Non_Rumor_P: 0.4887, Non_Rumor_F1: 0.3128 Train Epoch 3: Time 513.4600, Acc: 0.9884, Loss: 1.0060, Rumor_R: 0.9865, Rumor_P: 0.9924, Rumor_F: 0.9894, Non_Rumor_R: 0.9907, Non_Rumor_P: 0.9835, Non_Rumor_F1: 0.9871 Test: Time: 15.7953, Acc: 0.2820, Loss: 0.7016, Rumor_R: 0.5949, Rumor_P: 0.2496, Rumor_F: 0.3517, Non_Rumor_R: 0.1297, Non_Rumor_P: 0.3967, Non_Rumor_F1: 0.1954 Train Epoch 4: Time 520.3489, Acc: 0.9901, Loss: 1.0017, Rumor_R: 0.9850, Rumor_P: 0.9970, Rumor_F: 0.9910, Non_Rumor_R: 0.9964, Non_Rumor_P: 0.9819, Non_Rumor_F1: 0.9891 Test: Time: 15.7334, Acc: 0.2885, Loss: 0.7014, Rumor_R: 0.6095, Rumor_P: 0.2548, Rumor_F: 0.3593, Non_Rumor_R: 0.1323, Non_Rumor_P: 0.4105, Non_Rumor_F1: 0.2001
# 训练结果 values_expanded[inf_mask] = -327800 Train Epoch 0: Time 528.8097, Acc: 0.8892, Loss: 1.3516, Rumor_R: 0.9241, Rumor_P: 0.8807, Rumor_F: 0.9019, Non_Rumor_R: 0.8463, Non_Rumor_P: 0.9009, Non_Rumor_F1: 0.8727 Test: Time: 16.1896, Acc: 0.3566, Loss: 0.6983, Rumor_R: 0.6204, Rumor_P: 0.2812, Rumor_F: 0.3870, Non_Rumor_R: 0.2282, Non_Rumor_P: 0.5527, Non_Rumor_F1: 0.3231 Train Epoch 1: Time 518.5146, Acc: 0.9955, Loss: 1.0037, Rumor_R: 0.9949, Rumor_P: 0.9968, Rumor_F: 0.9959, Non_Rumor_R: 0.9961, Non_Rumor_P: 0.9938, Non_Rumor_F1: 0.9950 Test: Time: 15.6606, Acc: 0.2963, Loss: 0.6963, Rumor_R: 0.4526, Rumor_P: 0.2202, Rumor_F: 0.2963, Non_Rumor_R: 0.2202, Non_Rumor_P: 0.4526, Non_Rumor_F1: 0.2963 Train Epoch 2: Time 515.2467, Acc: 0.9977, Loss: 0.9867, Rumor_R: 0.9977, Rumor_P: 0.9981, Rumor_F: 0.9979, Non_Rumor_R: 0.9977, Non_Rumor_P: 0.9972, Non_Rumor_F1: 0.9974 Test: Time: 15.6808, Acc: 0.2963, Loss: 0.6985, Rumor_R: 0.5036, Rumor_P: 0.2335, Rumor_F: 0.3191, Non_Rumor_R: 0.1954, Non_Rumor_P: 0.4472, Non_Rumor_F1: 0.2719
另外一个问题是,Twitter数据集在论文中描述:
Twitter (Boididou et al., 2018) contains 7334 rumors and 5599 non-rumors for training and 564 rumors and 427 non-rumors for testing.
似乎这个train跟test的数量跟dataset/twitter/texts/twitter_final2.json 取data[:8617]跟data[8617:] 不相同,这是因为我们用错了文件还是哪里没有弄对呢?可以劳烦告知一下吗?
dataset/twitter/texts/twitter_final2.json
data[:8617]
data[8617:]
这种情况应该是梯度爆炸造成的,可能是超参数设置或者其它原因造成的,需要打印出梯度看一下。请问你在Weibo, Sarcasm数据集上有遇到这个问题吗?是否用的是最新的代码呢?
数据集以dataset文件里的为准。论文汇报的可能是原始数据集的大小。
您好!非常棒的工作!我们在复现这个工作的时候,code正常运行了,但是在
models/rule_detection.py
文件中的class RFND
定义代码里的E_ = torch.stack(E_, dim=0).contiguous() * values
中出现了大量的NaN,进而导致了损失全部为NaN,具体原因似乎是因为torch.stack(E_, dim=0).contiguous()
中存在0
,而values
中存在-inf
,0*inf=NaN
,我们尝试使用如下代码试图替换NaN:目前不知道为何会存在
0*inf=NaN
,数据集我们用的Twitter_Set, 设置text_path='dataset/twitter/texts/twitter_final2.json'
以及img_path=dataset/twitter/embedding.pt
除了使用
values_expanded[inf_mask] = 0
以外,我们还尝试了values_expanded[inf_mask] = -3276800
以近似-inf
, 结果离论文中的0.911还是差距比较大,目前不知道具体原因,如果我们的实现有哪里有问题还烦请您指出!训练结果如下:
另外一个问题是,Twitter数据集在论文中描述:
似乎这个train跟test的数量跟
dataset/twitter/texts/twitter_final2.json
取data[:8617]
跟data[8617:]
不相同,这是因为我们用错了文件还是哪里没有弄对呢?可以劳烦告知一下吗?