ybracke / transnormer

A lexical normalizer for historical spelling variants using a transformer architecture.
GNU General Public License v3.0
6 stars 1 forks source link

Hyphens #64

Open ybracke opened 1 year ago

ybracke commented 1 year ago
ybracke commented 1 year ago

There can also be "word-internal" quotation marks with hyphenated words:

Original:

an- "
ruffet

This should be normalized as:

anruft "

See discussion with Susanne on mattermost

ybracke commented 11 months ago

Tokens in the DTA may be interrupted. This can be a (1) line-break (2) a line-break + a quotation mark, (3) ...?

Here is an example