kermitt2 / delft

a Deep Learning Framework for Text https://delft.readthedocs.io/
Apache License 2.0
388 stars 64 forks source link

NER performance with Ontonotes and number-related ELMo embeddings #7

Closed kermitt2 closed 5 years ago

kermitt2 commented 6 years ago

See allenai/bilm-tf#59

We don't apply any formatting for numbers, we use the same tokenization as the one provided by CoNLL2012 dataset, so no clue for the moment.

kermitt2 commented 5 years ago

So it's simply that my batch size with ELMo was too small, so the less frequent classes had too few labels per batch for learning (e.g. single label per batch, the usual thing to avoid!).

The batch size and multiprocessing/parallel worker were adapted to ELMo, to keep the memory usage under 11GB (for training with a GTX 1080Ti). For something more generic, it might be necessary to review how the batch are created to ensure that rare classes are well represented, with automatic over-sampling techniques for instance.

However, for the time being, simply increasing the batch size looks good for Ontonotes to reach a f-score > 88.0 as expected.