Initialization of Type Words

anushkasw commented 1 month ago

I had a question regarding the initialization of type words. According to the code: if self.args.init_type_words: so_word = [a[0] for a in self.tokenizer(["[obj]","[sub]"], add_special_tokens=False)['input_ids']] meaning_word = [a[0] for a in self.tokenizer(["person","organization", "location", "date", "country"], add_special_tokens=False)['input_ids']]

The meaning words are initialized with certain entity types. While these are the probable entity types for the TACRED dataset, the same is not true for the SemEval dataset.

I wanted to know how this initialization affects the working of the algorithm on other datasets like SemEval. Should we change this initialization based on the dataset?

njcx-ai commented 1 month ago

Thanks for your attention. The choice of initialization has some influence on the final model performance, but the impact is not significant. To establish a stronger correspondence between the type words and the specific task at hand, it is necessary to adapt the initialization based on the characteristics of the dataset.

anushkasw commented 1 month ago

Got it. So, did you use a different set of initialization while training the semeval dataset or the same ones?

njcx-ai commented 1 month ago

The same one.

zxlzr commented 1 month ago

hi buddy, do you have any further questions?

yccckid commented 3 weeks ago

if I change a dataset, Should I change the entity type, i.e., ["person","organization", "location", "date", "country"] into my own entiy type? Is this better than not change?

zjunlp / KnowPrompt

Initialization of Type Words #30