Closed alopezlago closed 5 years ago
What about a BuzzPosition
that includes a character index, a word index, and/or a time index?
Also, you wouldn't default to -1
, you'd just leave an optional field out.
How should time index be formatted / what should it be relative to?
Should we allow the specification of a clue number? Obviously this would have to be well-defined elsewhere, but it seems plausible.
We should allow (but not require) the actual word being buzzed upon to be specified.
Should we distinguish between "buzzed upon hearing the last word" and "buzzed after the question ended, but not immediately"? (If so, how?)
For a time index, I'd put a start time on the MatchQuestion
and a timestamp on the MatchQuestionBuzz.
For a word index, I'd do just a simple word index, as a character index is too granular and a clue index is rightly defined elsewhere.
(I could see future fields for question content defining clues by word index and being able to cross reference that.)
I'm dubious on the utility of tracking how long after the end of a question a buzz happened.
Although the definition of clue boundaries is not in scope, I think it would help if a standard definition of word (or maybe a non-normative recommendation) were included in the schema.
Example implementation (after a few years of iteration): https://github.com/hftf/oligodendrocytes/blob/ada0a231cbff5ac121af2aef00fe4d6fa4aef837/transformers/f-html-to-w-html.js#L61
I'm fine with a further definition of a "word" as a "run of non-whitespace characters".
How about "run of consecutive characters that are neither whitespace, nor any sort of dash, nor a power mark"? I realize this is getting pretty finicky, but I feel like we want to be able to distinguish, e.g., which name people buzz on in the Johnson-Corey-Chaykovsky reaction. (And a power mark just doesn't seem like a word to me.)
I don’t have a strong opinion, but I would recommend the simpler implementation I linked above:
words are delimited by whitespace, but words that are just punctuation symbols are not words (including power marks, slashes between lines of poetry, etc.)
Fair enough, though my reaction to those examples is "Okay, maybe it should be 'run of consecutive alphanumeric characters.'" (I know that still has part of the hyphen/dash problem you mention.)
I'm also very okay with not defining "word" here.
I think a definition is within the scope of this schema “about the necessary data to pass back and forth and get a full recounting of what happened.” If there isn’t a somewhat stable definition of “word,” then that could make interchange more difficult.
Clue boundaries aren’t systematic so it makes sense to encode such a field separately (such as an array of substrings or indexes). But word boundaries are mostly systematic, so it would be redundant to store an array of words or encode slight behavior differences in how applications split words.
A normative definition may be premature as buzz tracking is a rather new technology, but it could also encourage more applications (to use the schema, to use buzz tracking).
It may also discourage the rare practice of putting power marks in the middle of words:
aggregations of the protein alpha-(*)synuclein are a marker for this disease. (2018 EFT, P4)
I think we should have an optional index or position field in MatchQuestionBuzz to track where a buzz occurred. This could either be based on the position of the first character buzzed in, or the word number. The default for this value could be -1, indicating that the position was not tracked for this buzz.