inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

How to run the example? #88

Open lfoppiano opened 8 years ago

lfoppiano commented 8 years ago

Hi, I'm opening a new issue for this problem, since I could not find any information.

I've manage to run the sampling.py with the wang dataset, it runs and it generates the pairs. So far I've just copy/pasted the entry in the README of the example.

Now I would like to run distance.py, but according to the documentation:

python distance.py \
    --distance_pairs 1M_nysiis_balanced.json \
    --distance_model linkage.dat \
    --input_signatures input/signatures.json \
    --input_records input/records.json \
    --input_ethnicity_estimator ethnicity_estimator.pickle \
    --verbose 3

What should I use as ethnicity_estimator.pickle?

MSusik commented 8 years ago

What should I use as ethnicity_estimator.pickle

The result of: https://github.com/inspirehep/beard/blob/master/examples/applications/author-disambiguation/ethnicity.py

Please note that it's not so simple to get the data needed by the ethnicity estimator. However, a pretty good disambiguation can be run without it, simply by skipping this parameter

SeekPoint commented 5 years ago

where is ethnicity_estimator.pickle?

MSusik commented 5 years ago

Ethnicity estimator was trained on data that is not publicly available and this we could not make trained estimator publicly available in the repo.