sehsanm / embedding-benchmark

Word Embedding benchmark project By Shahid Beheshti University NLP Lab
GNU General Public License v3.0
6 stars 16 forks source link

Write code to run Analogy task over data sets #16

Open sehsanm opened 5 years ago

sehsanm commented 5 years ago

There must be an option to set following option:

abb4s commented 5 years ago

hi , I created a sample format for test script : analogy_test.py.zip but I need a sample data-set of a corpus , its corresponding semantic vector model and an analogy data-set . if you define and create architecture of project it would be more clearer . I assumed that we have a data-sets package for loading corpora and data-sets and models package for loading models.

sehsanm commented 5 years ago

Hi Some points here I'm assuming what you have sent is a psudo-code for what needs to be done. What you have to do is to create a package containing base methods for analogy task.

The final output from my perspective is that you load all datasets for analogy (maybe multiple) and then run them and finally create a CSV file to have results as well as categories score for each dataset . So you are not dependent on corpus, you are dependent on a memory loaded model (see : #15 )

So the psudo code will be something like :


import models
#load dataset
analog_datasets=datasets.loadAnalogyDataset('/data/analogy')
#load model
model = models.loadmodel('/data/models/model_khafan.bin' )
for dataset in analog_datasets 
    for row in data_set:
        r1= model.getVec(row.a)
        r2= model.getVec(row.b)
        r3=model.getVec(row.c)
        words=model.getKNear(r3+r2-r1,thershold , 'Cosine_Distance')
        totals[row.category] = totals[row.category] + 1
        if row.d in words:
            corrects[row.category] = =corrects[row.category] + 1
    write_result_to_file(data_set , totals , corrects)

So what you have to do :

abb4s commented 5 years ago

hi , thank you for instructions. I tried to implement requirements but I can't test it completely because we haven't model yet . result file is attached : scripts.zip