rossvideo / Catena

BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

refactor polyglot text #42

Closed mejohnnaylor closed 11 months ago

mejohnnaylor commented 11 months ago

simplifies definition of the PolyglotText message to:

message PolyglotText { map<string, string> strings = 1; // In-line language definition }

Some example JSON serializations that validate:

{ "strings": { "en": "Hello", "es": "Hola", "en-CA": "Eh?", "$key": "greeting" }

}

The schemas enforce the use of language codes from the Language Metadata Table (SMPTE) using a divide & conquer approach so that a single, super-long, unmaintainable, unreadable regex is broken up into several shorter ones.

"polyglot_text": {
            "title": "Polyglot Text",
            "description": "Text that a client can display in one of multiple languages",
            "type": "object",
            "properties": {
                "strings": {
                    "type": "object",
                    "anyOf": [
                        {"$ref": "#/$defs/language_metadata_table"},
                        {"$ref": "#/$defs/language_pack_reference"}
                    ],
                    "minProperties": 1
                },
                "additionalProperties": false
            }
        },
        "language_metadata_table": {
            "title": "Language Metadata Table",
            "description": "A table of language codes",
            "$comment": "The language codes must be valid Language Metadata Table language codes. The validation could be done as one super-long regex, but, for readabilty, we've split them into language groups.",
            "type": "object",
            "patternProperties": {
                "^(en|en-AU|en-CA|en-HK|en-IE|en-MY)$": {
                    "title": "English Language Group",
                    "$comment": "@todo complete the regex",
                    "type": "string"
                },
                "^(es|es-ES|es-AR|es-BO|es-CL|es-CO)$" : {
                    "title": "Spanish Language Group",
                    "$comment": "@todo complete the regex",
                    "type": "string"
                }
            },
            "additionalProperties": false
        },
        "language_pack_reference": {
            "title": "Language Pack Reference",
            "description": "A key to look up text in a language pack",
            "type": "object",
            "properties": {
                "$key": {
                    "type": "string"
                }
            },
            "required": [
                "$key"
            ]
        },
mejohnnaylor commented 11 months ago

having slept on this, I think that "langs" would work better than "strings" as the wrapper around "en", "es", "$key" ...

we also need to resolve the issue of how to apply string constraints and have half a proposal thought out:

if the constraint is being strictly applied, then it's quite easy - for STRING_CHOICE the parameter type is no longer a string, but an integer that indexes into the choices array. This begs the question of how different STRING_CHOICE is to INT_CHOICE.

The problem comes with non-strict mode where a client can set the parameter value (a string now) to anything. That anything has to be echoed to the other connected clients. This means that in a TV station in, say Belgium, the Flemish and Walloon co-workers could drive each other crazy by setting a string value to something in say Flemish that would then appear on the UIs of their Walloon co-workers. Note that strict flag isn't specified in our proto definition of STRING_CHOICE - is that an error?

For STRING_STRING_CHOICE, the problem is again easier in strict mode - the string value is just the value element of each choice. In non-strict mode where the client could set the value to an arbitrary string we could at least just set the param value to whatever the client sent and then echo it back to all the connected clients.