home-assistant / intents

Intents to be used with Home Assistant
https://developers.home-assistant.io/docs/voice/overview/
Creative Commons Attribution 4.0 International
455 stars 522 forks source link

Plural word support. #1079

Open ernst77 opened 1 year ago

ernst77 commented 1 year ago

As we are moving on with responses would be nice to have plural word support.

It's easy for english language as you only have two forms (e.g. device and devices) some languages have 3 forms for different amounts (e.g. Lithuanian 1 - įrenginys, 2-įrenginiai, 3-įrenginių) some even 5(or 6?).

Would there be a possibility for example to specify variable name and count and correct form would be given.

I've had experience with https://www.i18next.com/translation-function/plurals their system seems to work fine for plurals on different languages. Maybe we could get an inspiration from them? :D

AalianKhan commented 1 year ago

~~Hey I am a bit confused about why. if we implement this, what will it enable us to do?~~ Can't we already do this by using an if statement? {% if match | length = 1 %} įrenginys {% endif %}

ernst77 commented 1 year ago

for responses where we give names of device states like {% if no_match | length > 4 %} Ne, {{ no_match[:3] | join(", ") }} ir {{ (no_match | length - 3) }} kitų nėra {%- else -%}

which gives response something like this "Ne, Darbo stalo lempa, Holo sienos, Svetainės lempa ir 2 kiti nėra"

because of the number 2 word kiti should change form.

if count is 1 - kitas if count 2-9 - kiti if count >10 - kitų

It's easy on english language to do simple if as you only have two forms. Other languages have 5 forms and making 5 if statements everywhere where there is a number does not make sense.

ernst77 commented 1 year ago

~Hey I am a bit confused about why. if we implement this, what will it enable us to do?~ Can't we already do this by using an if statement? {% if match | length = 1 %} įrenginys {% endif %}

I see your language (Urdu) have two forms as well I don't expect for you to understand the issues as neither English nor Urdu have more than 2 forms.

There is a simple tool to check which languages have other forms and on which numbers those forms apply.

ernst77 commented 1 year ago

There is an article with example for those whose language does not have many plural forms.

Will tag language leaders whose language have many plural forms @Ahmed-farag36 @LubosKadasi @skynetua @spuljko @makstech

tetele commented 1 year ago

I would suggest an option similar to requires_context, maybe called requires_cardinality, which only recognizes the intent if the matched set of entities has the exact required cardinality. The reasoning for my suggestion is that it can be applied in multiple circumstances, but the main one I can think of is for when asking about the state of "all" devices in an area, where "all"=1.

Specifically: "is the kitchen light on?" is basically the same as "are the kitchen lights on?" when the kitchen has exactly one light. However, when there are 2 lights, "are the kitchen lights on?" has a deterministic answer, whereas "is the kitchen light on?" begs the question "which light?" instead of "sorry, i didn't understand".

Here's a demo snippet of my proposal: responses/HassGetState.yaml

responses:
  intents:
    HassGetState:

      all: |
        {% if not query.unmatched: %}
          Yes
        {% else %}
          {% set no_match = query.unmatched | map(attribute="name") | sort | list %}
          {% if no_match | length > 4 %}
            No, {{ no_match[:3] | join(", ") }} and {{ (no_match | length - 3) }} more not
          {%- else -%}
            No,
            {% for name in no_match -%}
              {% if not loop.first and not loop.last %}, {% elif loop.last and not loop.first %} and {% endif -%}
              {{ name }}
            {%- endfor %} not
          {% endif %}
        {% endif %}

      all_one: |
        There's just one {{ slots.domain }} and it's {{ state.state_with_unit }}

sentences/HassGetState.yaml

intents:
  HassGetState:
    data:
      - sentences:
          - are all [the] {on_off_domains:domain} {on_off_states:state} [in <area>]
        response: all
        requires_cardinality:
          above: 1
      - sentences:
          - are all [the] {on_off_domains:domain} {on_off_states:state} [in <area>]
        response: all_one
        requires_cardinality:
          exactly: 1
      - sentences:
          - is [the] {on_off_domains:domain} {on_off_states:state} [in <area>]
        response: all
        requires_cardinality:
          exactly: 1
TheFes commented 1 year ago

for responses where we give names of device states like {% if no_match | length > 4 %} Ne, {{ no_match[:3] | join(", ") }} ir {{ (no_match | length - 3) }} kitų nėra {%- else -%}

which gives response something like this "Ne, Darbo stalo lempa, Holo sienos, Svetainės lempa ir 2 kiti nėra"

because of the number 2 word kiti should change form.

if count is 1 - kitas if count 2-9 - kiti if count >10 - kitų

It's easy on english language to do simple if as you only have two forms. Other languages have 5 forms and making 5 if statements everywhere where there is a number does not make sense.

It will never be 1, in case there is only one other device, it will list the name So in case the result is A, B, C, D it will return those four, for A, B, C, D, E it will return A, B, C and 2 more

ernst77 commented 1 year ago

Specific case does not matter, when we have numbers, word form change depending on the number. It's difficult to explain for people whose language have only 2 forms.

FOR NL

image

IF anything else than number 1 is being used you use plural form.

IN LT

image

For 21 we use one form, for 22 another... it depends on a number. The logic is not simple as "oh bigger than 1 lets use plural".

tetele commented 1 year ago

It will never be 1, in case there is only one other device, it will list the name

Maybe not, but it can be either 2-9 or 10+, which begs 2 plural forms, as I understand from @ErnestStaug

ernst77 commented 1 year ago

@tetele Yes, I see your language have different forms as well.

image

Although simple if with >=20 would work for you.

tetele commented 1 year ago

I don't know where you got that from, but Romanian only has singular and plural forms for nouns and adjectives. That doesn't mean I don't understand the issue you're having.

ernst77 commented 1 year ago

I use this tool to check on what numbers other languages have different forms. Maybe it's not 100% correct. For you in Romanian 15 of something and 25 of something would be the same?

tetele commented 1 year ago

Yes. And considering what @TheFes said, that you could never have "and 1 more", you could do what you suggested I could, i.e. check whether there are more than 10 more results, in which case change the plural form.

However, there is also the solution i proposed, which (maybe) covers more cases, but needs additional implementation.

ernst77 commented 1 year ago

I guess for most languages, this is not an issue, but not in our case. I think Polish, Ukrainian languages have the same issues as well, hopefully I am not the only one.

Just to reiterate

image

In Lithuanian we have 3 forms. e.g.

Simple >9 does not work.

other languages have different cases depending on different numbers.

tetele commented 1 year ago

Ah, ok, now i get it. Complicated indeed, but still fixable with some ifs in the response.

ernst77 commented 1 year ago

Yes it's possible, but if we will have more responses with different cases I am not keen on writing if everywhere. Rather would have seperate file for plural words and depending on a number it would choose correct form. Less clutter and better code readibility. e.g. ($light_key, 3) would give me šviesa.

So respone of All 3 lights have been turned on for lt would become Visos 3 {plural($light_key, 3)} buvo {plural($turn_on_key, 3)} which would choose correct form and give me Visos 3 šviesos buvo įjungtos.

Same goes for different number Visos 3 {plural($light_key, 21)} buvo {plural($turn_on_key, 21)} would make Visa 21 šviesa buvo įjungta.

To be honest even word Visos (en: all) would need different form depending on the count.

P.S. disregard syntax and response example.

yawor commented 1 year ago

This is exactly what gettext does. There's an ngettext function in gettext library, which takes a key (it can be a singular word form for example) and a number and retrieves proper word (or sentence) form from a gettext's compatible file. Of course using gettext here would be an overkill, as it is an application localisation library in the first place, but it is a good example and great library to learn from when it comes to supporting different languages. To support plural forms, gettext has a way to define number of plural forms and how to apply them.

Here's a rule for Polish:

Plural-Forms: nplurals=3; \
    plural=n==1 ? 0 : \
           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

And here for Lithuanian:

Plural-Forms: nplurals=3; \
    plural=n%10==1 && n%100!=11 ? 0 : \
           n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;

Also regarding Romanian, it seems it actually has 3 plural forms as well:

Plural-Forms: nplurals=3; \
    plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;

You can find rules for other languages in their documentation: https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html

witold-gren commented 8 months ago

You can find few example in polish translations https://github.com/home-assistant/intents/blob/main/responses/pl/HassGetState.yaml#L47 or https://github.com/home-assistant/intents/blob/main/responses/pl/HassClimateGetTemperature.yaml#L6. This is how I solved it now, but I don't know what it will look like in the future 😀