oudalab / StructuredEventExtraction

0 stars 0 forks source link

label mismatched after bert tokenizer #5

Closed YanLiang1102 closed 3 years ago

YanLiang1102 commented 3 years ago

' token:[CLS], label:538 token:as, label:0 token:an, label:0 token:east, label:0 token:in, label:0 token:##dia, label:538 token:company, label:0 token:rescue, label:0 token:force, label:0 token:from, label:0 token:all, label:0 token:##ah, label:538 token:##abad, label:538 token:approached, label:0 token:ca, label:0 token:##wn, label:538 token:##pore, label:538 token:120, label:3 token:br, label:0 token:##itis, label:0 token:##h, label:538 token:women, label:538 token:and, label:0 token:children, label:0 token:captured, label:0 token:by, label:36 token:the, label:0 token:se, label:0 token:##po, label:0 token:##y, label:538 token:forces, label:538 token:were, label:0 token:killed, label:0 token:in, label:33 token:what, label:0 token:came, label:0 token:to, label:0 token:be, label:0 token:known, label:0 token:as, label:0 token:the, label:0 token:bi, label:0 token:##bi, label:0 token:##gh, label:538 token:##ar, label:538 token:massacre, label:538 token:their, label:0 token:remains, label:3 token:being, label:0 token:thrown, label:0 token:down, label:0 token:a, label:6 token:nearby, label:0 token:well, label:0 token:in, label:0 token:an, label:0 token:attempt, label:0 token:to, label:0 token:hide, label:0 token:the, label:0 token:evidence, label:114 token:., label:0 token:[SEP], label:0'

YanLiang1102 commented 3 years ago

this is resolve by using | as delimiter instead of , when preparing the topic related training data.

YanLiang1102 commented 3 years ago

since the sentence token also has ",", so some of the tokes has been messed up.