Honor $COUNT_TEST, don't test on training data.

nytimes / ingredient-phrase-tagger

Extract structured data from ingredient phrases using conditional random fields

Other

785 stars 237 forks source link

I ran this script, and the generated stats from evaluate.py were unbelievably good:

Sentence-Level Stats:
    correct:  91
    total:  100
    % correct:  91.0

Word-Level Stats:
    correct: 572
    total: 588
    % correct: 97.2789115646

Looking closer, it seems like the model is being tested on a small amount of the data it was trained on. Providing the additional arguments on line 9:--count=$COUNT_TEST --offset=$COUNT_TRAIN fixes that and yields a more reasonable assessment of accuracy:

Sentence-Level Stats:
    correct:  1489
    total:  1999
    % correct:  74.4872436218

Word-Level Stats:
    correct: 10399
    total: 11450
    % correct: 90.8209606987

nytimes / ingredient-phrase-tagger

Honor $COUNT_TEST, don't test on training data. #5