Some abbreviations have inconsistent whitespace, for example spelling e. g. with space. The tokenizer should have some way of eliminating spaces in these based on a list in some file, possibly producing some annotation that indicates the original spelling (maybe sic+hi@rend="x-space"):
e.g.
Or adding an attribute with the original spelling (could do , though that is not really TEI)
Some abbreviations have inconsistent whitespace, for example spelling
e. g.
with space. The tokenizer should have some way of eliminating spaces in these based on a list in some file, possibly producing some annotation that indicates the original spelling (maybe sic+hi@rend="x-space"):e.g.Or adding an attribute with the original spelling (could do, though that is not really TEI)