nju-websoft / SPARQA

SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases (AAAI 2020)

MIT License

69 stars 10 forks source link

怎么得到论文上的F1值？ #12

Open g1kyne opened 2 years ago

g1kyne commented 2 years ago

See running/freebase/pipeline_cwq.py if run CWQ 1.1. See running/freebase/pipeline_grapqh.py if run GraphQuestions. Below, an example on GraphQuestions. 你好，请问怎么得到论文上的F1值？是运行上述两个文件就可以吗？还是需要进行下面的一系列操作？运行步骤不是很明白

g1kyne commented 2 years ago

运行pipeline_grapqh.py是这样的结果：

all_f1_score: 561.5324373390596

count_number: 1839

请问怎么得到论文中的21呢？

simba0626 commented 2 years ago

您好，感谢您的关注 561.5324/2608 = 21.53 //2608是所有测试集的问句数量

g1kyne commented 2 years ago

谢谢您的解答。请问我在运行另一个数据集cwq时，结果是

module: 3_evaluation

end ，是您提供的cwq数据集的zip包缺少对应的已训练文件吗？是否需要我自己训练，从module=1.0开始吗？是否需要先载入知识图谱到virtuoso数据库？

g1kyne commented 2 years ago

您好，我在运行pipeline_cwq.py文件后，得到如下结果：

all_f1_score: 930.9517663221876

count_number: 2225

end 论文中提到34689的10分之一用于测试，即测试集中应该是3468条问句但是930.9517/3468=26.84% ，没有达到预期的结果31% 想问一下这是什么原因？

simba0626 commented 2 years ago

您好，感谢您的关注 (1) 测试集规模是3531个问句 (2) 您计算 930.95176，猜测是消融实验结果SPARQA w/o sentence-level scorer 930.95176/3531=26.36 （但是不能确定） (3) 真实all_f1_score应该在1111多一点 1111/3531=31.46 您试试evaluation/kbcqa_evaluation.py中的两行切换一下，score 或 total_score试试看看

score_to_queryid_sparql[grounded_graph.score].append(grounded_graph.grounded_query_id) #word level matcher

 score_to_queryid_sparql[grounded_graph.total_score].append(grounded_graph.grounded_query_id)

thanks

g1kyne commented 2 years ago

已经得到了解决，非常感谢！