lgessler / glam

(WIP) a webapp for language documentation
Eclipse Public License 2.0
40 stars 3 forks source link

Sentence Boundary Structure #41

Open lgessler opened 6 months ago

lgessler commented 6 months ago

Sentences can in principle be represented by a Span Layer. A couple of ways you could do this:

  1. Sentences are delimited by single spans, each of which indicates the beginning (or end) of a sentence.
  2. Sentences are identified by single spans, each of which contains every token of the sentence

With this in mind, we didn't explicitly include a structure for sentences in the core model. However, it might be more ergonomic for UI programmers to do so. One way you could do this is with a :token/start-of-sentence that indicates that any token that has this set to true is the beginning of a sentence.

The upside of this is that it makes it a bit easier to work with when you're given a document tree (I'd imagine), and there's no additional configuration overhead. The downside is that in (unusual, I expect) cases where you want multiple sentence-like grouping of tokens, it may be confusing to have this as an option next to using span layers. There is additionally the more remote concern that this would be inconsistent with our goal of providing only layers that are strictly necessary structurally.