inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

Script for building a balanced pair dataset #55

Closed natsheh closed 9 years ago

glouppe commented 9 years ago

To not lose my comments in the folded diffs above:

I would change the API and make the sampling function accepts a list of signatures, where these signatures are assumed to be already coming from the training set. This way, train/test split would happen only at one place (i.e. not inside the function) and this function would focus on what it should: drawing pairs.

MSusik commented 9 years ago

Continued in https://github.com/inveniosoftware/beard/pull/73