ybracke / transnormer

A lexical normalizer for historical spelling variants using a transformer architecture.
GNU General Public License v3.0
6 stars 1 forks source link

Handling Named Entities #38

Open ybracke opened 1 year ago

ybracke commented 1 year ago

Learn about and experiment with methods for proper handling of named entities.

Named entities (NE) may be distorted during normalization, e.g. das Samoliſche Kraut -> das Salomischen Kraut, Cejonius -> Ce Kostenius.

To improve this, we have to learn more about existing methods to handle NE in neural machine translations. Check out these papers and look for more:

ybracke commented 1 year ago

Software:

ybracke commented 1 year ago

Conceptually: How can a pre-trained encoder (decoder) deal with previously recognized NE? (This question extends to previously recognized foreign material, etc.