Assist Configuration Language missing Hebrew

leranp commented 1 year ago

The problem

When choosing the Language from the list, there is no Hebrew snap

What version of Home Assistant Core has the issue?

core-2023.5.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

assist_pipeline

Link to integration documentation on our website

https://www.home-assistant.io/integrations/assist_pipeline/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 1 year ago

Hey there @balloob, @synesthesiam, mind taking a look at this issue as it has been labeled with an integration (assist_pipeline) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of `assist_pipeline` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign assist_pipeline` Removes the current integration label and assignees on the issue, add the integration domain after the command.

_{^{(message by CodeOwnersMention)}}

assist_pipeline documentation assist_pipeline source _{^{(message by IssueLinks)}}

shmuelzon commented 1 year ago

This happens because the available languages are those that are supported by the conversational model (I think that's the intents repository), the supported languages of a speech-to-text service (if one exists) and the supported languages of a text-to-speech service (default configuration has Google Translate). So, it takes all of the languages from all services and intersects them with each other and whatever's left is show in the drop down list.

The problem is that the conversational model uses he for Hebrew while the Google TTS us iw for Hebrew so they don't make the cut at the end.

I have two suggestions for fixing this, not sure what the best approach would be. The first is to "normalize" the Hebrew language code:

diff --git a/homeassistant/util/language.py b/homeassistant/util/language.py
index 4ec8c74ffa..9882210dc9 100644
--- a/homeassistant/util/language.py
+++ b/homeassistant/util/language.py
@@ -87,6 +87,12 @@ class Dialect:
         # Languages are lower-cased
         self.language = self.language.casefold()

+        # Normalize language name
+        for language_names in SAME_LANGUAGES:
+            if self.language in language_names:
+                self.language = language_names[0]
+                break
+
         if self.region is not None:
             # Regions are upper-cased
             self.region = self.region.upper()

And the second is a bit more generic by using existing methods to compare if two languages are the same and using that for the intersection:

diff --git a/homeassistant/components/assist_pipeline/websocket_api.py b/homeassistant/components/assist_pipeline/websocket_api.py
index bd2ec53db4..7d68e2910e 100644
--- a/homeassistant/components/assist_pipeline/websocket_api.py
+++ b/homeassistant/components/assist_pipeline/websocket_api.py
@@ -314,7 +314,7 @@ async def websocket_list_languages(
             dialect = language_util.Dialect.parse(language_tag)
             languages.add(dialect.language)
         if pipeline_languages is not None:
-            pipeline_languages &= languages
+            pipeline_languages = language_util.intersect(pipeline_languages, languages)
         else:
             pipeline_languages = languages

@@ -324,7 +324,7 @@ async def websocket_list_languages(
             dialect = language_util.Dialect.parse(language_tag)
             languages.add(dialect.language)
         if pipeline_languages is not None:
-            pipeline_languages &= languages
+            pipeline_languages = language_util.intersect(pipeline_languages, languages)
         else:
             pipeline_languages = languages

diff --git a/homeassistant/util/language.py b/homeassistant/util/language.py
index 4ec8c74ffa..9882210dc9 100644
--- a/homeassistant/util/language.py
+++ b/homeassistant/util/language.py
@@ -199,3 +205,14 @@ def matches(

     # Score < 0 is not a match
     return [tag for _dialect, score, tag in scored if score[0] >= 0]
+
+def intersect(
+    set1: Iterable[str], set2: Iterable[str]
+) -> set[str]:
+    """Return the intersection of two language sets taking into consideration name variations."""
+    languages = set()
+    for language in set1:
+        matching_languages = matches( language, set2 )
+        if len(matching_languages) > 0:
+            languages.add(matching_languages[0])
+    return languages

The latter, though more generic, might return either he or iw for Hebrew, depending on the order intersection and I don't know how that might affect things later on in the pipeline.

@synesthesiam or @emontnemery, as most of the related code here is yours, I'd be happy to get your feedback before opening a pull request.

Thanks!

leranp commented 1 year ago

This happens because the available languages are those that are supported by the conversational model (I think that's the intents repository), the supported languages of a speech-to-text service (if one exists) and the supported languages of a text-to-speech service (default configuration has Google Translate). So, it takes all of the languages from all services and intersects them with each other and whatever's left is show in the drop down list.

The problem is that the conversational model uses he for Hebrew while the Google TTS us iw for Hebrew so they don't make the cut at the end.

I have two suggestions for fixing this, not sure what the best approach would be. The first is to "normalize" the Hebrew language code:
diff --git a/homeassistant/util/language.py b/homeassistant/util/language.py
index 4ec8c74ffa..9882210dc9 100644
--- a/homeassistant/util/language.py
+++ b/homeassistant/util/language.py
@@ -87,6 +87,12 @@ class Dialect:
         # Languages are lower-cased
         self.language = self.language.casefold()

+        # Normalize language name
+        for language_names in SAME_LANGUAGES:
+            if self.language in language_names:
+                self.language = language_names[0]
+                break
+
         if self.region is not None:
             # Regions are upper-cased
             self.region = self.region.upper()
And the second is a bit more generic by using existing methods to compare if two languages are the same and using that for the intersection:
diff --git a/homeassistant/components/assist_pipeline/websocket_api.py b/homeassistant/components/assist_pipeline/websocket_api.py
index bd2ec53db4..7d68e2910e 100644
--- a/homeassistant/components/assist_pipeline/websocket_api.py
+++ b/homeassistant/components/assist_pipeline/websocket_api.py
@@ -314,7 +314,7 @@ async def websocket_list_languages(
             dialect = language_util.Dialect.parse(language_tag)
             languages.add(dialect.language)
         if pipeline_languages is not None:
-            pipeline_languages &= languages
+            pipeline_languages = language_util.intersect(pipeline_languages, languages)
         else:
             pipeline_languages = languages

@@ -324,7 +324,7 @@ async def websocket_list_languages(
             dialect = language_util.Dialect.parse(language_tag)
             languages.add(dialect.language)
         if pipeline_languages is not None:
-            pipeline_languages &= languages
+            pipeline_languages = language_util.intersect(pipeline_languages, languages)
         else:
             pipeline_languages = languages

diff --git a/homeassistant/util/language.py b/homeassistant/util/language.py
index 4ec8c74ffa..9882210dc9 100644
--- a/homeassistant/util/language.py
+++ b/homeassistant/util/language.py
@@ -199,3 +205,14 @@ def matches(

     # Score < 0 is not a match
     return [tag for _dialect, score, tag in scored if score[0] >= 0]
+
+def intersect(
+    set1: Iterable[str], set2: Iterable[str]
+) -> set[str]:
+    """Return the intersection of two language sets taking into consideration name variations."""
+    languages = set()
+    for language in set1:
+        matching_languages = matches( language, set2 )
+        if len(matching_languages) > 0:
+            languages.add(matching_languages[0])
+    return languages
The latter, though more generic, might return either he or iw for Hebrew, depending on the order intersection and I don't know how that might affect things later on in the pipeline.

@synesthesiam or @emontnemery, as most of the related code here is yours, I'd be happy to get your feedback before opening a pull request.

Thanks!

This PR is going to handle the 2 option codes https://github.com/home-assistant/core/pull/93681 but i am not sure if this will fix the Hebrew selection.

shmuelzon commented 1 year ago

@leranp, unfortunately that doesn't help with this specific issue but you'll notice that I am relying on it to correlate between the two versions

tidharmor commented 1 year ago

Is there any workaround for this issue until it's properly fixed?

shmuelzon commented 1 year ago

@tidharmor You can try to apply manually one of the above suggestions. I haven't opened a pull request for it since I don't know which is the preferred method. I suggest starting with the first one since it's less likely to cause other issues

tidharmor commented 1 year ago

@shmuelzon I'm a developer, but haven't played around with Home Assistant development yet. I'm running HAOS, is it possible to apply this fix in this environment or do I have to set up a development environment?

Thanks

shmuelzon commented 1 year ago

@tidharmor I've never setup a development environment either :) I don't use HAOS, I use the Docker installation and I just run /bin/bash in the HA docker instance, modify the files and restart HA.

issue-triage-workflows[bot] commented 9 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

home-assistant / core