ryankiros / skip-thoughts

Sent2Vec encoder and training code from the paper "Skip-Thought Vectors"
2.05k stars 543 forks source link

What is 'utable' and 'btable'? #12

Open amirj opened 8 years ago

amirj commented 8 years ago

A model has been trained according to the instruction here. I can load the model using the following commands:

import tools
embed_map = tools.load_googlenews_vectors()
model = tools.load_model(embed_map)

After that, I want to run an experiment (for example: Semantic-Relatedness):

import eval_sick
eval_sick.evaluate(model, evaltest=True)

Here is the output:

/Users/AmirHJ/projects/skip-thoughts/skipthoughts.pyc in encode(model, X, use_norm, verbose, batch_size, use_eos)
     97     # word dictionary and init
     98     d = defaultdict(lambda : 0)
---> 99     for w in model['utable'].keys():
    100         d[w] = 1
    101     ufeatures = numpy.zeros((len(X), model['uoptions']['dim']), dtype='float32')

KeyError: 'utable'

Would you please help me to run experiments on the trained model?

danielricks commented 7 years ago

The way the BookCorpus was created is different from how training models work, at least using the code in training/tools.py. Their native model contains both 'utable' and 'btable', but when you train the model yourself it contains only 'table'. I'm not sure why they coded it that way, and it is frustrating, because all their other methods depend on both tables. It requires some acrobatics to make things work. I didn't test any of their experiments, but I did write interface code that mostly works with functionality present given the embedding space. The code is here: https://github.com/danielricks/penseur