unicode-org / inflection

code, data and documentation related to handling inflection problems
Other
0 stars 1 forks source link

Grammatical agreement for quantities #5

Open grhoten opened 4 months ago

grhoten commented 4 months ago

Getting quantities grammatically correct should be within the scope of this working group. The ability to take a unit and to add a numerical value, like 1 or 2, to make a quantity is important. The scope should involve the adjectives and nouns that are a part of the scope. Anything involving number pronunciation should remain a part of RBNF in CLDR.

Here's an example for the word foot:

Number Grammeme Resolved surface form
1 singular foot
2 plural feet

Here's an example with the word карандаш (pencil) in Russian:

Number Grammemes Resolved surface form
1 singular & nominative карандаш
2 singular & genitive карандаша
5 plural & genitive карандашей

While CLDR plural rules can be used to map which form to use. The hope is that this project can define how to turn the surface form from one to another, like from карандаш to карандаша or карандашей.

For a language like English, the rules to change a word from singular to plural is pretty regular. The rules can fit onto a single page, but you need the data to be able to handle the edge cases, like for goose, moose, foot, child, new and so forth.

If it's done well, you should be able to reinflect a word while maintaining the grammatical case of the word. For a language like German or Russian, this is an important topic.

nciric commented 4 months ago

To summarize, you are ok with CLDR plural rules telling us - you need a form 0, 1, few or many here, but I don't know how to change the word to match. Use the inflection library to do so.

I agree with that - it would use previously collected plural rules for languages, and it falls perfectly into our domain to inflect the word to match the plurality.

richgillam commented 3 months ago

Anything involving number pronunciation should remain a part of RBNF in CLDR.

At some point we should explore changing this. It seems like we might be able to simplify the RBNF rules for at least some languages if RBNF could take advantage of the inflection engine to inflect individual words in a spelled-out number.

richgillam commented 3 months ago

To summarize, you are ok with CLDR plural rules telling us - you need a form 0, 1, few or many here, but I don't know how to change the word to match. Use the inflection library to do so.

Can we replace the CLDR plural rules with something that lives in the inflection engine and knows which quantities are interesting, or does that basically amount to the same stuff we're doing now, but in a new location?

macchiati commented 3 months ago

We need the plural rules anyway, for backwards compatibility and message formats. But what we can do is have a simplified version in some cases. For example, in some slavic languages some of the plural categories are identical to just putting an item into a particular case (eg, genitive).

On Mon, Mar 18, 2024 at 3:56 PM Rich Gillam @.***> wrote:

To summarize, you are ok with CLDR plural rules telling us - you need a form 0, 1, few or many here, but I don't know how to change the word to match. Use the inflection library to do so.

Can we replace the CLDR plural rules with something that lives in the inflection engine and knows which quantities are interesting, or does that basically amount to the same stuff we're doing now, but in a new location?

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/inflection/issues/5#issuecomment-2005212027, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMAFSNOOGV5HHBRNIS3YY5WJRAVCNFSM6AAAAABELUOQ6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBVGIYTEMBSG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>