Open se4u opened 5 years ago
I'm getting the same issue; for example, contrastive_trans.t7 says that (adventure.n.01, cognition.n.01) is a valid hypernym pair which makes no sense. It also says that (ballplayer.n.01, wrongdoer.n.01) is a valid hypernym pair which I really hope isn't true because I used to be a ballplayer.
In case this is helpful to anyone, a few things I noticed:
a) Lua is 1 indexed, so if using python definitely need to offset indices by 1 to match with correct labels. b) In the data, hypo and hyper attributes have negative samples as well, as shown from the targets field. The hypernyms field contains the actual hypo and hyper but as 2 columns stacked together.
I believe that part a) of what @andreasgrv mentioned was the source of the errors I was getting. I had a brief correspondence with one of the authors of the paper and he suggested the same possibility as well.
I also noticed the 1-based indexing in the createDatasets.lua.
@andreasgrv what do mean by hypo and hyper attributes have negative samples as well as shown from the targets field. The hypernyms field contains the actual hypo and hyper but as 2 columns stacked together.
. Can you please explain? \cc @ivendrov
@nandana Below code demonstrates what I mean, has a dependency on torchfile
import torchfile
if __name__ == "__main__":
tf = torchfile.load('dataset/contrastive_trans.t7')
for part in ['train', 'val', 'test']:
print('========== %s shapes ===========' % part)
print(tf[part]['hypernyms'].shape)
# Below also contain negative samples as explained in the paper
print(tf[part]['hypo'].shape)
print(tf[part]['hyper'].shape)
print(tf[part]['target'].shape)
got it, thanks a lot @andreasgrv! This really helped to explore the generated datasets!
Hi,
I am working with the wordnet dataset from this repo and I noticed something odd with the data.
Fitting the transitive closure baseline on the training data
dataset/contrastive_trans.t7
also results in false negatives on the test data!! My understanding is that the wordnet train-test datasets were generated by just splitting the transitive closure of WordNet so the transitive closure baseline should never generate any false positives.As far as I can tell the training data does not contain any noise in it so false positives should not occur by the closure of the training data. Can you give some reasons for why the transitive closure of training data might generate false positives?