performance comparison - Githubissues

YanLiang1102 commented 3 years ago

test performance using topic as multi-task learning. only enhance the encoder, not using the topic as input for event extraction though

epoch 1 ('lose Mention', u'cfcec8e30722564d5bf43fcb5f739cd8') ('lose Mention', u'04cdc0c45303f7024417e4d94f9a7c13') ('lose Mention', u'66b0e5014943dde6c74c048086b7a0a3') ('Micro_F1:', 67.776451242312973) ('Micro_Precision:', 66.690309424594375) ('Micro_Recall:', 68.898557362033429) ('Macro_F1:', 50.955298281699037) ('Macro_Precision:', 55.190862598324387) ('Macro_Recall:', 50.812833395810941)

epoch 2 ('lose Mention', u'395e263d21484d8998d853b0b1b6ec5a') ('lose Mention', u'063e9a5b7265bc3ae0ea731c5f06545b') ('lose Mention', u'362cbeafe8c757195425faa40a88ff61') ('Micro_F1:', 67.062822261786408) ('Micro_Precision:', 67.72205099467638) ('Micro_Recall:', 66.416304098923746) ('Macro_F1:', 58.714489780943694) ('Macro_Precision:', 63.478851556871454) ('Macro_Recall:', 58.704903536659984)

epoch 3 ('lose Mention', u'f0ae798aa5b9014e3c8e47aefb281330') ('lose Mention', u'd3f3a2e2f335d8c50f7612c29aa0cb2a') ('lose Mention', u'1418f6cb3c97797a7542e7ec6ac04427') ('lose Mention', u'913c2d91f695ccb637386f211cd8d94d') ('lose Mention', u'f697f918553fc11d73d709bf3029603d') ('Micro_F1:', 68.213119095965396) ('Micro_Precision:', 65.833084820858005) ('Micro_Recall:', 70.771696817036869) ('Macro_F1:', 60.237205257354191) ('Macro_Precision:', 61.145512218719368) ('Macro_Recall:', 62.252039674965374)

epoch 10 ('Micro_F1:', 64.949966644429608) ('Micro_Precision:', 63.125135076723581) ('Micro_Recall:', 66.883444011907486) ('Macro_F1:', 59.76302032588179) ('Macro_Precision:', 58.998255594751669) ('Macro_Recall:', 61.678292061307531)

YanLiang1102 commented 3 years ago

test performance bert-crf:

epoch 1 ('Micro_F1:', 67.462058468012714) ('Micro_Precision:', 65.979246026533559) ('Micro_Recall:', 69.013052438745135) ('Macro_F1:', 54.734193211038985) ('Macro_Precision:', 58.043941483652027) ('Macro_Recall:', 55.793730660778905)

epoch 3 ('lose Mention', u'df0a542b1da02933d7ec99db1d242f88') ('lose Mention', u'35cc76d3b1b331cf496dc13451da70db') ('lose Mention', u'aca274ee083d94192aa9d58832f1a258') ('lose Mention', u'49d6dc789d08db61efd0fbc46a55570a') ('lose Mention', u'b30bf4393b0cc55add7ae3e7380aaac3') ('lose Mention', u'24345d37f8e66bd7fcb3362bdc861942') ('Micro_F1:', 68.219692783733009) ('Micro_Precision:', 65.924818453652293) ('Micro_Recall:', 70.68010075566751) ('Macro_F1:', 60.441497708246828) ('Macro_Precision:', 61.680135350505118) ('Macro_Recall:', 61.821271767813826)

YanLiang1102 commented 3 years ago

The above two performance shows that only using topic_id as multi-task learning won't help the performance. next thing to try make the event extraction conditioned on topic. and also at the same time using multi-task learning.

YanLiang1102 commented 3 years ago

conditional on the topic, by adding the topic embedding directly to the sentence embedding. epoch3 ('Micro_F1:', 67.491943549283263) ('Micro_Precision:', 65.561312607944728) ('Micro_Recall:', 69.539729791618967) ('Macro_F1:', 60.342763541804359) ('Macro_Precision:', 60.752334592420986) ('Macro_Recall:', 62.328372626437236)

YanLiang1102 commented 3 years ago

The topics information does not help might because of this: most of the topics are tail topics.

token:military conflict, test_count:258, train_count:981 token:rail accident, test_count:18, train_count:36 token:limited overs final, test_count:2, train_count:6 token:concert tour, test_count:49, train_count:167 token:event, test_count:16, train_count:36 token:news event, test_count:21, train_count:61 token:flood, test_count:6, train_count:30 token:aircraft occurrence, test_count:24, train_count:76 token:military operation, test_count:1, train_count:1 token:nuclear weapons test, test_count:4, train_count:12 token:civil conflict, test_count:31, train_count:106 token:civilian attack, test_count:49, train_count:196 token:concert, test_count:6, train_count:13 token:historical event, test_count:12, train_count:59 token:terrorist attack, test_count:18, train_count:50 token:olympic event, test_count:4, train_count:10 token:operational plan, test_count:3, train_count:7 token:hurricane, test_count:97, train_count:314 token:recurring event, test_count:34, train_count:94 token:football match, test_count:8, train_count:67 token:music festival, test_count:41, train_count:101 token:wildfire, test_count:12, train_count:10 token:wrestling event, test_count:23, train_count:55 token:airliner accident, test_count:10, train_count:33 token:international football competition, test_count:9, train_count:43 token:cricket tournament, test_count:11, train_count:56 token:aircraft accident, test_count:7, train_count:27 token:athleticrace, test_count:8, train_count:12 token:cycling championship, test_count:1, train_count:1 token:rugby match, test_count:2, train_count:2 token:games, test_count:6, train_count:23 token:earthquake, test_count:17, train_count:31 token:legislative session, test_count:2, train_count:2 token:winter storm, test_count:7, train_count:19 token:canadian football game, test_count:2, train_count:0 token:horse race, test_count:2, train_count:16 token:badminton event, test_count:1, train_count:0 token:military attack, test_count:1, train_count:3 token:summit, test_count:1, train_count:3 token:international ice hockey competition, test_count:7, train_count:16 token:summit meeting, test_count:1, train_count:10 token:mma event, test_count:3, train_count:7 token:athletics competition, test_count:1, train_count:4 token:international handball competition, test_count:1, train_count:4 token:cycling championships, test_count:1, train_count:0 token:individual golf tournament, test_count:3, train_count:16 token:pro bowl, test_count:1, train_count:7 token:u.s. federal election campaign, test_count:1, train_count:2 token:commonwealth games event, test_count:1, train_count:0 token:swimming event, test_count:1, train_count:1 token:athletics race, test_count:1, train_count:6 token:university boat race, test_count:4, train_count:2 token:hurling championship, test_count:1, train_count:1 token:field hockey, test_count:1, train_count:4 token:australian rules football grand final, test_count:2, train_count:2 token:international baseball tournament, test_count:1, train_count:1 token:tennis event, test_count:1, train_count:4 token:rugby tournament, test_count:1, train_count:8

YanLiang1102 commented 3 years ago

Add in the topic2event type distribution as prior. (similar to adding vocab for each topic into the extraction work, will this work?? not sure..)

YanLiang1102 commented 3 years ago

The topic will help the extraction by assuming two things: 1. given a topic, certain event type happens more often than others.

similar topic embedding can infer similar event types in the sentences. First of all, are they true???

oudalab / StructuredEventExtraction

performance comparison #6

test performance using topic as multi-task learning. only enhance the encoder, not using the topic as input for event extraction though

test performance bert-crf: