unicode-org / inflection

code, data and documentation related to handling inflection problems
Other
0 stars 1 forks source link

Additional context for plural selection #20

Open macchiati opened 4 months ago

macchiati commented 4 months ago

Unit measures in English are a bit odd in terms of plural behavior, and probably needs some special categories of usage. We should look across other languages to see what behavior they exhibit.

Examples:

Normally we say 3 feet. But when used adjectivally, they are singular, eg, "a 3-foot board" or "a 3 foot board". But also "three days long".

For verb/pronoun agreement the plurals appear to be mixed. For integral cases it appears to be ok to have verb agreement be either singular or plural, but plural pronoun references sound very odd to me.

Three days is a long time. It is too long to wait.
Three days are a long time. They are too long to wait.

Three feet is not enough. It needs to be longer.
Three feet are not enough. It needs to be longer.

Non-integral verb agreement also sounds odd to me.

3.5 feet are not enough.

I think there might be some hidden elision going on.

3.5 feet [long] is not enough.

grhoten commented 4 months ago

I think that there are 2 parts to this issue.

I've also had this discussion in the past adjective quantities. The adjective form is supposed to be hyphenated in English. Though I can see that not everyone will follow that rule. The "three days long" is not an adjective with a noun. Though a "three-day long stay" does have the noun "stay". As far as quantities as adjectives, yes the CLDR plural rules break down for English.

The quantity span agreeing with the words is/are is an unusual case. There will be some other scenarios like that. In the framework that we have, anything not meant to be inflected doesn't have the span marked for inflection. Everything is invariant by default. So that is one way to solve this specific problem. Everything can just use "is" in this scenario.

macchiati commented 4 months ago

The quantity span agreeing with the words is/are is an unusual case. There will be some other scenarios like that. In the framework that we have, anything not meant to be inflected doesn't have the span marked for inflection. Everything is invariant by default. So that is one way to solve this specific problem. Everything can just use "is" in this scenario.

There are different use cases for the grammatical data.

In message format, there are some important constraints. The original message has to have all the placeholders and selection in place, because the translators will not be able to add placeholders and will have very limited access to the changing the options inside of placeholders. (Currently none, but in MF2 those options that they can change will be called out for the translation software to support)

The upshot is, currently if plural categories might be needed for message variant selection in one language, then they should be in the source message, even if unused in that source (eg English). (Same for gender, etc.)

richgillam commented 3 months ago

It isn't the same issue, but proper inflection of phrases would also touch on things like "attorneys general" or "mothers-in-law".