Open zhouhoo opened 7 years ago
The dataset WN18.bin is a Wordnet subset. I'm not sure at all, but I bet that @mnick uses the same datasets as A. Bordes (https://everest.hds.utc.fr/doku.php?id=en:transe) There you can get the two datasets used by the experiment. Try to generate the WN18.bin (or other binary) using python pickle.
It seems like a preprocess to has been done before pickling the file. In the training the data is unpickled like this:
with open(self.args.fin, 'rb') as fin:
data = pickle.load(fin)
N = len(data['entities'])
M = len(data['relations'])
sz = (N, N, M)
true_triples = data['train_subs'] + data['test_subs'] + data['valid_subs']
if self.args.mode == 'rank':
self.ev_test = self.evaluator(data['test_subs'], true_triples, self.neval)
self.ev_valid = self.evaluator(data['valid_subs'], true_triples, self.neval)
elif self.args.mode == 'lp':
self.ev_test = self.evaluator(data['test_subs'], data['test_labels'])
self.ev_valid = self.evaluator(data['valid_subs'], data['valid_labels'])
xs = data['train_subs']
ys = np.ones(len(xs))
Now, if I want to train it on for example FB15k-237 dataset (that only contains three files of "train.txt", "test.txt" and "valid.txt"), I first have to generate an object structure containing test_subs, train_subs, valid_subs, entities, relations and then pickle it. I wished this pre-process code was available otherwise I believe we need to do that before testing on any new datasets.
Hi Thanks for your explanation on the unpickling of WN18 dataset from code. Were you able to preprocess the FB15k-237 dataset and generate Hole embeddings for the same? I am not able to relate the WN18 example entity numbers in this repository with the ones in the WN18 dataset here: https://everest.hds.utc.fr/doku.php?id=en:transe
Thank you for your great work. When I learn your code, I am confused by the WN18.bin dataset, why entities in it are all numbers? what are they stand for ?