Motivation

Consider:

I saw a fish in the water.
I saw fish-∅ in the water.

In (2), a lot of traditional item-and-arrangement (IA) accounts of English pluralization would have a zero allomorph of the plural morpheme coming after fish, as shown. Null allomorphy is of course common cross-linguistically, and many documentary linguists use similar IA accounts.

This poses a problem, since we require that tokens be anchored in a textual substring, and null morphs have no textual representation.

Proposed Feature

Following a discussion of how to handle this, we decided it was probably best to allow zero-length tokens. Tokens currently must contain at least one character, so lifting this would allow you to e.g. identify a zero-length substring beginning just after fish in (2) in order to have a null token which could host the annotations for the plural morpheme.

An issue with this is that if you have multiple null tokens in the same place there will be no way to tell their order. You could band-aid fix this in several ways, but for now, this seems niche enough that it's not worth handling.

lgessler / glam

Zero-length tokens #42

Motivation

Proposed Feature