wtangdev / UniRel

released code for our EMNLP22 paper: UniRel: Unified Representation and Interaction for Joint Relational Triple Extraction

Apache License 2.0

82 stars 17 forks source link

modify_bert #7

Closed zzhinlp closed 1 year ago

zzhinlp commented 1 year ago

Which part of the bert model is modified

wtangdev commented 1 year ago

Make bertmodel output the original (unpooled) attention scores.

zzhinlp commented 1 year ago

Initialize configurations and tokenizer.

added_token = [f"[unused{i}]" for i in range(1, 17)]
# If use unused to do ablation, should uncomment this
# added_token = [f"[unused{i}]" for i in range(1, 399)]
tokenizer = BertTokenizerFast.from_pretrained(
    "bert-base-cased",
    additional_special_tokens=added_token,
    do_basic_tokenize=False)

` 1.What do the second and fourth lines mean? Why are 17 and 399？ Looking forward to your reply.

wtangdev commented 1 year ago

To enable the tokenizer to handle the special token [unused*]. Please refer to the documents of transformers. Setting 17 or 399 is depend on the number of relation types in your schema. In my situation, 399 is big enough, and 17 is a casual number without specific meaning.

zzhinlp commented 1 year ago

Snipaste_2023-03-15_13-48-04 Why do some relations map to numbers?

wtangdev commented 1 year ago

Struggled to find a proper description so using [unused*] to represent. You remind me. The "17" is not a causal number but the number of relations that "unable to find a proper description".