Optional index or position field in MatchQuestionBuzz

alopezlago commented 8 years ago

I think we should have an optional index or position field in MatchQuestionBuzz to track where a buzz occurred. This could either be based on the position of the first character buzzed in, or the word number. The default for this value could be -1, indicating that the position was not tracked for this buzz.

puls commented 7 years ago

What about a BuzzPosition that includes a character index, a word index, and/or a time index?

Also, you wouldn't default to -1, you'd just leave an optional field out.

jonahgreenthal commented 7 years ago

How should time index be formatted / what should it be relative to?

Should we allow the specification of a clue number? Obviously this would have to be well-defined elsewhere, but it seems plausible.

We should allow (but not require) the actual word being buzzed upon to be specified.

Should we distinguish between "buzzed upon hearing the last word" and "buzzed after the question ended, but not immediately"? (If so, how?)

puls commented 5 years ago

For a time index, I'd put a start time on the MatchQuestion and a timestamp on the MatchQuestionBuzz.

For a word index, I'd do just a simple word index, as a character index is too granular and a clue index is rightly defined elsewhere.

(I could see future fields for question content defining clues by word index and being able to cross reference that.)

I'm dubious on the utility of tracking how long after the end of a question a buzz happened.

hftf commented 5 years ago

Although the definition of clue boundaries is not in scope, I think it would help if a standard definition of word (or maybe a non-normative recommendation) were included in the schema.

Example implementation (after a few years of iteration): https://github.com/hftf/oligodendrocytes/blob/ada0a231cbff5ac121af2aef00fe4d6fa4aef837/transformers/f-html-to-w-html.js#L61

puls commented 5 years ago

I'm fine with a further definition of a "word" as a "run of non-whitespace characters".

jonahgreenthal commented 5 years ago

How about "run of consecutive characters that are neither whitespace, nor any sort of dash, nor a power mark"? I realize this is getting pretty finicky, but I feel like we want to be able to distinguish, e.g., which name people buzz on in the Johnson-Corey-Chaykovsky reaction. (And a power mark just doesn't seem like a word to me.)

hftf commented 5 years ago

I don’t have a strong opinion, but I would recommend the simpler implementation I linked above:

words are delimited by whitespace, but words that are just punctuation symbols are not words (including power marks, slashes between lines of poetry, etc.)

I think more granularity isn’t really necessary.
There will always be false positives and negatives. I can think of many uses of hyphens or dashes that I would consider to be single “words” (E-flat, X-ray, p–n junction, 5–4 decision, NO₂^–, 10^–9), and many other symbols that can also be used to separate “words” (Either/Or, Silence=Death, 2+2+2+3 pattern, 3:2:1 ratio).
Words delimited by non-whitespace (visible) characters can cause miscellaneous display issues: unwanted line breaking at span boundaries¹, blocked kerning at span boundaries², what to do with dashes when words are split up into pieces e.g. in data visualization³, can make packet interface less intuitive as moderators must learn what symbols are part of words that can be clicked on, etc.

jonahgreenthal commented 5 years ago

Fair enough, though my reaction to those examples is "Okay, maybe it should be 'run of consecutive alphanumeric characters.'" (I know that still has part of the hyphen/dash problem you mention.)

I'm also very okay with not defining "word" here.

hftf commented 5 years ago

I think a definition is within the scope of this schema “about the necessary data to pass back and forth and get a full recounting of what happened.” If there isn’t a somewhat stable definition of “word,” then that could make interchange more difficult.

Clue boundaries aren’t systematic so it makes sense to encode such a field separately (such as an array of substrings or indexes). But word boundaries are mostly systematic, so it would be redundant to store an array of words or encode slight behavior differences in how applications split words.

A normative definition may be premature as buzz tracking is a rather new technology, but it could also encourage more applications (to use the schema, to use buzz tracking).

It may also discourage the rare practice of putting power marks in the middle of words:

aggregations of the protein alpha-(*)synuclein are a marker for this disease. (2018 EFT, P4)

quizbowl / schema

Optional index or position field in MatchQuestionBuzz #64