vered1986 / HypeNET

Integrated path-based and distributional method for hypernymy detection
Other
85 stars 13 forks source link

False Negatives in the dataset #8

Open avi-jain opened 3 years ago

avi-jain commented 3 years ago

Hello, upon experimenting with the dataset I came across several examples where a hypernym relationship exists but is labelled as False (mostly novels). Here are a few examples from the test dataset (lexical split) -

saraswatichandra novel False
pollyanna novel False
jurassic park novel False
makamisa novel False
the hunger games novel False
the secret novel False
...

You mention in the paper that the dataset was created via distant supervision and only the positives are manually audited. Could I state that the dataset is noisy and needs to be cleaned up a bit? Or are these, according to you, truly False annotations? Thank You

vered1986 commented 3 years ago

Hi Avi,

Yes, the dataset was built using distant supervision (with no human validation), so there is some percent of errors. Specifically, we err more on false negatives because we only considered indisputable hypernymy relations from the KBs (as opposed to opting for a more inclusive definition of hypernymy). Specifically for the examples you mentioned, I assume they are connected by a property we did not consider as hypernymy (either because we overlooked it or because it had a high rate of false positives).

I hope this answers your question!