Issue replicating accuracy of 0.98

lbozarth commented 4 years ago

Hi, Per https://arxiv.org/pdf/1802.01021.pdf Table 1, the tested accuracy is 0.98. The model generated using the provided systems: typeclassifier has a F1 score of .88

cmd: python3 learning/train_type.py my_config_v2.json --cudnn --fused --hidden_sizes 200 200 --batch_size 256 --max_epochs 10000 --name TypeClassifier --weight_noise 1e-6 --load_dir en_model --save_dir en_model --anneal_rate 0.9999 echo 'done generating model'

Metrics: precision 88.472
recall 88.369
sentence_correct 86.35% (43497 correct / 50372)
time_sentence_correct 89.41% (11260 correct / 12593)
time_token_correct 91.14% (36461 correct / 40006)
token_correct 88.37% (141411 correct / 160024) type_sentence_correct 80.66% (10158 correct / 12593)
type_token_correct 83.88% (33557 correct / 40006)

Config file used: { "datasets": [ { "type": "train", "path": "en_train.h5", "x": 0, "ignore": "other", "y": [ { "column": 1, "objective": "type", "classification": "type_classification" }, { "column": 1, "objective": "location", "classification": "location_classification" }, { "column": 1, "objective": "country", "classification": "country_classification" }, { "column": 1, "objective": "time", "classification": "time_classification" } ] }, { "type": "dev", "path": "en_dev.h5", "x": 0, "ignore": "other", "y": [ { "column": 1, "objective": "type", "classification": "type_classification" }, { "column": 1, "objective": "location", "classification": "location_classification" }, { "column": 1, "objective": "country", "classification": "country_classification" }, { "column": 1, "objective": "time", "classification": "time_classification" } ], "ignore": "other", "comment": "#//#" } ], "features": [ { "type": "word", "dimension": 200, "max_vocab": 2000000 }, { "type": "suffix", "length": 2, "dimension": 6, "max_vocab": 1000000 }, { "type": "suffix", "length": 3, "dimension": 6, "max_vocab": 1000000 }, { "type": "prefix", "length": 2, "dimension": 6, "max_vocab": 1000000 }, { "type": "prefix", "length": 3, "dimension": 6 }, { "type": "digit" }, { "type": "uppercase" }, { "type": "punctuation_count" } ], "objectives": [ { "name": "type", "type": "softmax", "vocab": "type_classification/classes.txt" }, { "name": "location", "type": "softmax", "vocab": "location_classification/classes.txt" }, { "name": "country", "type": "softmax", "vocab": "country_classification/classes.txt" }, { "name": "time", "type": "softmax", "vocab": "time_classification/classes.txt" } ], "wikidata_path": "wikidata", "classification_path": "classifications" }

lbozarth commented 4 years ago

I found someone else with similar issue (F1 score of 0.83 for type classifier): https://github.com/openai/deeptype/issues/31

JonathanRaiman commented 2 years ago

Several items matter for high F1:

imbalanced classes in the type classification cause issues, the code supports dynamic weight balancing to weigh up rare classes/weigh down frequent classes
A "topic" classification (e.g. "politics", "science", "fashion", etc..) will help by adding more granularity / act as a catch-all when too many other dimensions are set to "other"

I also recommend checking out our follow-up work DeepType 2 where we can forego human-made classifications by instead using a contrastive loss and embedding the local Wikidata neighborhood of each entity: this skips type classification altogether, and instead enables directly learning how to disambiguate entities. Code is here: https://github.com/deep-type/deeptype2 -

openai / deeptype

Issue replicating accuracy of 0.98 #59