Open suamin opened 3 years ago
@gaotianyu1350 Thanks for great work. I have same questions.
@suamin Did you find the answers for your questions? As for NYT10m, I trained BERT with sentence level
framework, and then test it by using bag level
framework and multi label
separately. The results shows that test with bag-level
(60.6, 35.32) is better than multi label
(58.39, 31.98). However, I still cannot reproduce the result on the paper.
@HenryPaik1 thanks for your input. I've not been able to find answers to the questions. I still struggle to reproduce paper numbers. For BERT+sent+AVG
, I get AUC=55.45, macro-F1=21.12
on val
and AUC=47.49, macro-F1=11.23
on test
with Bag-level
evaluation.
Hi,
Thank you for the latest contribution "Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction", having manual test significantly improves our understanding of the DRE models. I have few questions re the paper's experiments:
Q1: Is it possible to provide the pre-trained checkpoints for BERT+sent/bag+AVG models?
Q2: Regarding evaluation, it is mentioned in paper:
Can you elaborate this further, is this same as in current eval part of BagRELoader code? Unfortunately, I cannot find
'anno_relation_list'
in the manually created test set, does this require additional pre-processing?Q3: At evaluation (valid, test) time, the
bag_size
parameter should be set to0
(so we consider all sentences in the Bag as also reported in paper -- but this is not handled in current BagRE framework) andentpair_as_bag
toTrue
?Q4: Can you provide the scores for the NYT10m
val
set for the models reported in Table 4 of the paper? Do you also plan to provide P@k metrics and pr_curves for the models reported in Table 4?Q5: Is BERT+sent level training performed with
MultiLabelSentenceRE
or simpleSentenceRE
?Thank you in advance!