unicode-org / inflection

code, data and documentation related to handling inflection problems
Other
5 stars 1 forks source link

Draft proposal for short-term inflection enhancements for MF2 #10

Open macchiati opened 8 months ago

macchiati commented 8 months ago

Here is a draft proposal for information that we could use in the very short term with MF2, to improve grammar of messages. While we would target MF2, the information could be used more broadly.

A. Enhance grammatical feature information

We could use more gender / noun-class information in order to switch among appropriate variant messages. To do this well, we need to know what the grammatical categories are. The localization tooling can then expand or contract the variant messages to be appropriate for the locale, much as it does for plurals right now in MF1.

Sample message

.match {$person-nc}
animate-masculine {{{$person} needs it: give it to him.}}
feminine {{{$person} needs it: give it to her.}}
…

$user-gender

  1. For each locale, provide data for what the user-gender categories are (eg, for “you” or imperatives).
  2. We are only concerned with categories that grammatically affect the rest of a message.
  3. The fallback category is “other”, and we only need categories that would be distinct from the fallback.
  4. Some, like English will be just {other}, while others like French will be {feminine, other}, while others might be {masculine, feminine, other}.

$person-noun-class

  1. For each locale, provide data for what the person-gender categories are (eg, for “Pat Smith”).
  2. We are only concerned with categories that grammatically affect the rest of a message.
  3. Some, like Japanese will be just {other}, while others like French will be {feminine, other}, and others like English will be {masculine, feminine, other}.
  4. This can be more than gender, eg for Polish {animate-masculine feminine neuter}

$object-noun-class

  1. For each locale, provide data for what the object-noun-class categories are (eg, for “Paris”, or “basketball”).
  2. This can have different ‘scopes’ for types of objects (we currently have a scope for units), but the scopes should be locale-independent.
  3. We are only concerned with categories that grammatically affect the rest of a message.
  4. This can be more than gender, eg for Polish {inanimate-masculine feminine neuter}

B. Test data for gender detection

We could prepare test data for at least one locale where we can derive the gender of people or objects, and use them in messages with a new function :noun-class

Sample message

.input {$person}
.locale $person-nc = {$person :u:noun-class}
.match {$person-nc}
animate-masculine {{{$person} needs it: give it to him.}}
feminine {{{$person} needs it: give it to her.}}
…

C. Test data for case inflections

We could prepare test data for at least one locale where we can support case as an option:

Sample message

.input {$person}
{{Give it to {$person u:case=dative}.}}

(We do have case data for units in CLDR, but it would be better if we had some more general examples.)

BrunoCartoni commented 8 months ago

Principal categories that can be affected by Gender:

I'd also suggest to limit our scope to human gender (so only "you", "he/she", "I" will be taken into account, not "it").

macchiati commented 8 months ago

S meitli isch ine cho, es het de öpfel gseh.

On Wed, Mar 13, 2024, 02:16 BrunoCartoni @.***> wrote:

Principal categories that can be affected by Gender:

  • nouns
  • pronouns
  • adjectives
  • verbs

I'd also suggest to limit our scope to human gender (so only "you", "he/she", "I" will be taken into account, not "it").

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/inflection/issues/10#issuecomment-1993890022, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMC7QAUJCLDVM5KE2Z3YYAKONAVCNFSM6AAAAABETJJ6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJTHA4TAMBSGI . You are receiving this because you authored the thread.Message ID: @.***>

BrunoCartoni commented 8 months ago

This example is debatable (both "si" and "es" are possible).

As always, we should prioritize according to use case (e‧g.: do we need subject pronoun reference? Or is it more object pronoun?).

macchiati commented 8 months ago

I jotted that down quickly on my phone (I agree that it isn't required to say 'es', though some people do. Just an example.)

I'm in general agreement with what you are saying. For the pronouns, I don't think we need 1st person at all; for 2nd person and 3rd person (and explicit people), we just need to know which particular categories are relevant for which language, for the localization tooling. The particular categories will just depend on the language, and we can have (eg) he/she/other if needed for a particular language.

On Wed, Mar 13, 2024 at 7:29 AM BrunoCartoni @.***> wrote:

This example is debatable (both "si" and "es" are possible).

As always, we should prioritize according to use case (e‧g.: do we need subject pronoun reference? Or is it more object pronoun?).

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/inflection/issues/10#issuecomment-1994533976, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMCVE4LGCJ5DCBQKGPDYYBPDTAVCNFSM6AAAAABETJJ6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJUGUZTGOJXGY . You are receiving this because you authored the thread.Message ID: @.***>

grhoten commented 8 months ago

This topic around the choice of words and associated human gender seems related to https://github.com/unicode-org/inflection/issues/7.