oudalab / Arabic-NER

32 stars 11 forks source link

Training Result #20

Open YanLiang1102 opened 6 years ago

YanLiang1102 commented 6 years ago

@khaledJabr @ahalterman 1.With Pretrained pruned vector and ner spacy trained model, then update the model only with prodigy labled data, like 800 tokens, we get this: no merged ner class yet image

2.with no pretrained model eveything else is the same as case 1 we got this: no merged ner class yet (so yes, the pretained model does help) image

3 trained with only ldc data with Prodigy with pretraind spacy ner model, other case like case 1. image

4. prodigy data + 23 times prodigy size reheasal data other case like case 3. since we have 18670 (onto token) and 801 (prodigy labeled token) in order to get used all of the data we use 23 as multiplier since 18670/801=23 with 18670 training samples we get 4122 empty spanned removed. image

5. with merged ner class, other condition like 4. image

6 with Khaled cleaned data , other condition like 5 image

YanLiang1102 commented 6 years ago

Before merge the ner labels, these are the labels distribution:

{'CARDINAL': 336,
 'CARDINAL" E_OFF="1': 2,
 'DATE': 1066,
 'DATE" E_OFF="1': 16,
 'DATE" E_OFF="5': 2,
 'DATE" S_OFF="1': 2,
 'EVENT': 224,
 'FAC': 244,
 'GPE': 1806,
 'GPE" S_OFF="1': 2,
 'LANGUAGE': 18,
 'LAW': 112,
 'LOC': 172,
 'MONEY': 106,
 'NORP': 2164,
 'ORDINAL': 598,
 'ORDINAL" E_OFF="1': 2,
 'ORDINAL" E_OFF="3': 2,
 'ORG': 3424,
 'ORG" E_OFF="1': 8,
 'ORG" S_OFF="1': 16,
 'PERCENT': 74,
 'PERSON': 3586,
 'PERSON" S_OFF="1': 30,
 'PRODUCT': 36,
 'PRODUCT" S_OFF="1': 2,
 'QUANTITY': 198,
 'QUANTITY" E_OFF="1': 2,
 'TIME': 170,
 'TIME" E_OFF="1': 2,
 'WORK_OF_ART': 124,
 'WORK_OF_ART" E_OFF="1': 2}
//prodigy labeled distributuion
{'GPE': 439, 'ORG': 133, 'PERSON': 229}
//after update ldc tags we have:
{'GPE': 2224, 'MISC': 5260, 'ORG': 3448, 'PERSON': 3616}
ahalterman commented 6 years ago

Good work! So it looks like around 70% is where we're going to be for now. Can you get per-class accuracy, too? We don't really care so much about MISC and it could be that that one is harder than the rest.

YanLiang1102 commented 6 years ago

@ahalterman Training and eval without MISC similar result image

YanLiang1102 commented 6 years ago

1.augmented_for_training_1 (MISC filtered out (trained on PERSON, ORG and GPE) and only eval on GPE: eval data set count: image

Training result: image

2.augmented_for_training_2 (MISC filtered out (trained on PERSON, ORG and GPE) and only eval on PERSON:

image

Training result: image

3.augmented_for_training_3 (MISC filtered out (trained on PERSON, ORG and GPE) and only eval on ORG: image

image

This is the overall accuracy including all the class: image

@ahalterman hey Andy check these training result, so all trained on data without MISC and eval on individual tag class, pretty average the accuracy for each class, you can see how many records has been evaluated on (for each training the first picture).