How to split train set and test set

thunlp / JointNRE

Joint Neural Relation Extraction with Text and KGs

MIT License

187 stars 36 forks source link

How to split train set and test set #4

Closed Cauchyzhou closed 6 years ago

Cauchyzhou commented 6 years ago

When I use your model to extra relation from my custom dataset. I have Triples and sentences with head entity mention and tail entity mention. Should I split dataset by Triple? For example, I have 10 Triples,8 and sentences refers to the 8 triples for trainset 2 and sentences refers to the 2 triples for testset. Should I make sure all entity occurs in trainset? If not,I think the h and t represent are just randomly initialized when testing.

THUCSTHanxu13 commented 6 years ago

The dataset is a benchmark dataset, you can find it on https://github.com/thunlp/NRE or http://iesl.cs.umass.edu/riedel/ecml/. We just align KGs to this dataset. If an entity only occurs in test set and also not occurs in KGs, its representation is indeed randomly initialized when testing. In fact, our KGs contain all entities in the training set and test set, and filter out the triples in the test set.

THUCSTHanxu13 commented 6 years ago

Neural models extract relations mainly rely on the whole sentence semantics. If both h and t do not occur in the training set, randomly initialized representations for h and t will give all sentences containing (h,t) the same weights, these sentence embeddings can still extract relations.

Cauchyzhou commented 6 years ago

Understand. Thank you.