dataset explation - Githubissues

zhouhoo commented 7 years ago

Thank you for your great work. When I learn your code, I am confused by the WN18.bin dataset, why entities in it are all numbers? what are they stand for ?

vfrico commented 7 years ago

The dataset WN18.bin is a Wordnet subset. I'm not sure at all, but I bet that @mnick uses the same datasets as A. Bordes (https://everest.hds.utc.fr/doku.php?id=en:transe) There you can get the two datasets used by the experiment. Try to generate the WN18.bin (or other binary) using python pickle.

sharifza commented 6 years ago

It seems like a preprocess to has been done before pickling the file. In the training the data is unpickled like this:

   with open(self.args.fin, 'rb') as fin:
        data = pickle.load(fin)

    N = len(data['entities'])
    M = len(data['relations'])
    sz = (N, N, M)

    true_triples = data['train_subs'] + data['test_subs'] + data['valid_subs']
    if self.args.mode == 'rank':
        self.ev_test = self.evaluator(data['test_subs'], true_triples, self.neval)
        self.ev_valid = self.evaluator(data['valid_subs'], true_triples, self.neval)
    elif self.args.mode == 'lp':
        self.ev_test = self.evaluator(data['test_subs'], data['test_labels'])
        self.ev_valid = self.evaluator(data['valid_subs'], data['valid_labels'])

    xs = data['train_subs']
    ys = np.ones(len(xs))

Now, if I want to train it on for example FB15k-237 dataset (that only contains three files of "train.txt", "test.txt" and "valid.txt"), I first have to generate an object structure containing test_subs, train_subs, valid_subs, entities, relations and then pickle it. I wished this pre-process code was available otherwise I believe we need to do that before testing on any new datasets.

aayushee commented 5 years ago

Hi Thanks for your explanation on the unpickling of WN18 dataset from code. Were you able to preprocess the FB15k-237 dataset and generate Hole embeddings for the same? I am not able to relate the WN18 example entity numbers in this repository with the ones in the WN18 dataset here: https://everest.hds.utc.fr/doku.php?id=en:transe

mnick / holographic-embeddings

dataset explation #3