nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
Other
2.01k stars 623 forks source link

Add a generic way to expand symmbols regardless of translation #16739

Open LeonarddeR opened 1 week ago

LeonarddeR commented 1 week ago

Is your feature request related to a problem? Please describe.

There are several requests to expand the symbols file, such as #6904, #5194, #16720. They all seem to stall on the concern that adding additional symbols will break speech for some languages.

Describe the solution you'd like

Based on the way cldr was added to NVDA, I propose making that logic more generic, i.e. by adding more additional symbol dictionaries next to cldr that can be enabled/disabled from a checkable list in the speech settings panel. So instead of having the Include Unicode Consortium data (including emoji) when processing characters and symbols checkbox, add a checkable list called Additional symbol processing dictionaries that contains the Unicode Consortium data (including emoji) item, among others.

Additional context

In #16732, @yeatersink proposed adding some symbol dictionaries for several ancient languages. I think that we follow the approach as outlined above, we could create an ancient languages symbols dictionary that contains all these symbols.

CyrilleB79 commented 6 days ago

@LeonarddeR, thanks for having opened the issue.

I would rather have described the use case, before describing a solution. The use case seems to be: find the best way to read languages for which no synth exists such as ancient languages. A solution may be to expand symbol files; maybe that's not the only one?

yeatersink commented 6 days ago

Greetings, Thank you to @LeonarddeR for re- raising this issue. I sincerely appreciate it.

@CyrilleB79 I will present the problem, then the proposed solution:

1 The problem. We need access to several ancient languages for required education. These languages are often grouped together in dictionaries, which demands access to a multitude of languages on the same page, at the same time. This is also true for grammars where books are written in one language to teach another language.

NVDA already has a automatic language switching feature, which allows this to happen with the languages NVDA already reconizes. Adding new or additional languages should not be a problem. We need a solution to make this happen, and I and my team are willing to do what needs to be dome to make it work.

2 The Solution: So far, we have written several tables for several languages for Lib Louis, and Duxbury. Thus, we have braille. However, we need speech. NVDA seems to be the most logical solution.

What I would like to see is to be able to add a new language to NVDA for each of the languages that we need.

NVDA has an automatic language switching feature which will accommodate several languages on the same page. It seems that if we add a new language for the languages that we have written braille for, then this will present the most natural and helpful solution. The lib louis tables are already in NVDA, we just need speech.

This is why we opened this issue, and created a pull request. 3 What we have done so far:

We have tested the characters and symbols for all the languages by adding them to our English locale on our own personal computer. We have created locale folders for each individual language already.

We used the NVDA Speech synthesizer, and it speaks the characters, when they are part of the English locale.

What would be optimal is if we can use NVDA, Microsoft voices or the eloquence tts to simply speak what the characters are. But also recognize them as their own language. This way we could check a box to tell NVDA what language it is or what language to speak.

For example: Akkadian has a US system and a German system. They use the same Unicode character, but have different names for them and different braille for them. So, we would like to be able to tell NVDA that we what NVDA to speak the German names, just like we can choose the German lib louis braille table as opposed to NVDA being confused and not knowing if it is the US system or the German system.

I know we can work this out. Please help me, I desperately need this for school, and Leonard is going to be needing it in the fall for his school as well.

paulGeoghegan commented 6 days ago

@LeonarddeR This would actually be a great solution. I'm actually learning Spanish at the minute and it would be nice to be able to enable something that could tell me when I encounter a Spanish n which NVDA currently calls a regular n.

I love the idea of being able to have a list of options in terms of character sets that are currently enabled. It would be great if we could have a sub-menu inside settings or something that could have a keyboard shortcut for easy access.

LeonarddeR commented 6 days ago

Some thoughts.

  1. @CyrilleB79 I tried to come up with a generic approach to cover several use cases as outlined in the mentioned issues, therefore I see this particular issue merely as a solution focused one rather than having a discussion about the underlying issues. Again, the problem regarding ancient languages ​​is not an isolated issue.
  2. @yeatersink I think we should treat these ancient languages ​​separately from the concept of supported locales in NVDA, but rather as a bunch of foreign characters NVDA should know how to pronounce, and that's exactly where symbol dictionaries are meant for. Your point about the German and English system for Akkadian emphasizes this. That case can be solved by providing the English symbol names in a dictionary for English, and the German names in a dictionary specifically for German. Automatic language switching is mainly meant for speech synthesizers to know when to switch languages, but these ancient languages don't have a proper speech synthesizer, apart from Hebrew.
  3. @paulGeoghegan I'm not sure about the spanish n, do you mean Ñ? That's properly announced with ESpeak here, OneCore stays silent though. I think that should be treated as a different issue.
paulGeoghegan commented 6 days ago

@LeonarddeR

  1. @paulGeoghegan I'm not sure about the spanish n, do you mean Ñ? That's properly announced with ESpeak here, OneCore stays silent though. I think that should be treated as a different issue.

My example was just an example of how others could use it because I just love the idea of having support for different character sets that you could enable and disable when you need them.

Yes I am using one core so I might try out Espeak.

CyrilleB79 commented 6 days ago

The solution proposed by @LeonarddeR seems promising because it takes into account the fact that the names of characters may have various names in different languages (e.g. difference between English and German names). On the opposite, the solution provided in the PR was not adaptable for users speaking a language other than English.

paulGeoghegan commented 6 days ago

@CyrilleB79 to be honest it might not be either way. For example the Akkadian language character names are defined in English so if you are a non-english speaker then you may not understand them. This does not apply to every language but to a lot of them. This solution would however allow for a solution that would allow other language users to at least enable other character sets if they wish.

LeonarddeR commented 6 days ago

It is common sense that if a symbol is not defined in a locale, English is used as a fallback. Also note that enabling these additional character sets would be opt-in, they'd still be disabled by default. That said, ideally these new symbols would be translated to other languages as well, but it is up to every locale maintainer to do this or not.

paulGeoghegan commented 6 days ago

@LeonarddeR exactly. I just wanted to make it clear to @CyrilleB79 that they would technically be language-specific.

CyrilleB79 commented 6 days ago

@LeonarddeR exactly. I just wanted to make it clear to @CyrilleB79 that they would technically be language-specific.

It was clear to me and that was the sense of my comment https://github.com/nvaccess/nvda/issues/16739#issuecomment-2191193154.

To be extra-clear, this new subset of characters should be provided in English and the opportunity should be offered to translators to translate them. If they don't, the character's name will fallback to English.

LeonarddeR commented 1 day ago

@seanbudd Curious to know what you think about this approach. Would NV Access accept a pr for this?

Adriani90 commented 20 hours ago

I agree this would be a huge improvement in screen reading different kind of content. It would be nice if the checkable list would group symbols in soemthing like:

But note that in case of mathematical symbols at least, I added many of them already to the symbols.dic some years ago, so they are translated in many languages now. We should be careful not to cause conflicts between the current symbols.dic file and the optional symbol dictionaries.

seanbudd commented 17 hours ago

@LeonarddeR - feel free to open a PR

yeatersink commented 9 hours ago

@Adriani90 This is exactly what I had been hoping for. A method of adding various symbols in this exact manner. @LeonarddeR How do we move forward? I am ready to get rocking and rolling. @paulGeoghegan get your boots on brotha.

@seanbudd thanks for the help and @CyrilleB79 you have been a blessing.

LeonarddeR commented 9 hours ago

@LeonarddeR How do we move forward? I am ready to get rocking and rolling.

I think best way would be if you could provide just one symbols dictionary that contains all the symbols you'd like to add, then I'll plug it into a pull request I will create later this week.

paulGeoghegan commented 9 hours ago

@LeonarddeR is it going to be formatted like the regular symbols.dic file?

LeonarddeR commented 9 hours ago

Yes, there won't be a difference in format. i'd probably call it ancientLanguages.dic or something like that

paulGeoghegan commented 8 hours ago

@LeonarddeR we have encountered a problem. We have multiple some files that contain the same characters but they have different names because they follow different systems. For example Akkadian has 2 systems for naming the characters but we can't put them both in an ancientLanguages.dic file.

paulGeoghegan commented 8 hours ago

@LeonarddeR we could just have an ancient languages category and then have a checkbox for each language. That would totally mitigate this problem and would reduce the chance of symbols conflicting.

LeonarddeR commented 8 hours ago

Wasn't the problem with Akkadian that there is an English and a German system? In that case, the English file would belong in a file for the English locale, but the German system in a file in the German locale folder.

yeatersink commented 7 hours ago

@LeonarddeR I really appreciate your help brother. For the sake of clarity, The Akkadian is not specifically based on the language like English and German. It is more a US system vs a German System. In other words, they both use the same unicode characters. But the difference is that the US system will describe the character where the German system has a Name for the character.

For example the US system will call a character a, "A times a." And the German system will call that character an "Eduru."

Both systems have different braille as well.

Both english speakers will want access to both of the German System and the US system. and German Speakers will want access to both systems as well. Especially since the German system relies heavily on the US system and ORACC refers to both systems.

So having the ability to choose one, and then have the same access to the other is vital.

It Would be way better to have the ability to select either the US system or the German system in this "Ancient languages" category."

The same problem for other languages as well. This same problem will arise with Greek and Coptic. It would be better to have the ability to choose one or the other, especially since they use the same alphabet.

In fact, there are several languages that have this same type of issue, and having the ability to select a different variation of that language would be huge!

This is breaking into a whole new level of academia my friend, and you are in the beginning of the break through. You are a hero bro.

Your help is very much appreciated.

LeonarddeR commented 7 hours ago

This gets all pretty complex. I'm trying to translate this to a user experience that allows one to enable/disable dictionaries, but what if one enables both Akkadian systems? I'm starting to wonder if this wouldn't be better suited to put in an add-on for ancient studies, for example. Of course then one needs to install the add-on, but then it opens a whole lot of new possibilities and the add-on author could easily add additional dictionaries for new languages.

So may be we should go even more generic and aim for a method where add-ons could provide custom symbol dictionaries.

yeatersink commented 7 hours ago

@LeonarddeR We actually thought about a ancient language add on, but to be honest, this idea that you have i think might be better. Especially if we have other categories for Music, Math, and other sciences.

To answer your question about someone enabling both systems of Akkadian, i do not think that would ever happen. It is usually one or the other. However, we would want the ability to easily enable one or the other. This is how we have it in Lib Louis right now. You have the ability to choose either system. It is easily accessible, straight forward and clear.

I am convinced that the same would be true for Greek and Coptic, Modern Hebrew and Biblical Hebrew etc.

Even if we have an Ancient Language add on, we would still need the ability to choose one system or the other. And we would still need the ability to read any material in our own mother tongue and be able to engage with ancient languages. Especially in a learning environment, such as taking a Hebrew class, even though we mainly speak English.

I think that your idea here is best. but where there might be a conflict with unicode characters, i.e. 2 systems of Akkadian that share unicode characters, only have the option to choose one system or the other, but not both.

This would work with Lib Louis tables as well.

What do you think?

LeonarddeR commented 6 hours ago

But what if we don't want to put the burden of maintaining all those dictionaries upon the shoulders of NV Access? If there would be an eco system similar to the custom braille tables system where symbol dictionaries could be defined in an add-ons manifest and added to the GUI that way, I think it will offer best of both worlds.

paulGeoghegan commented 6 hours ago

@LeonarddeR let me just make sure I'm understanding how this would work. You would be adding a new mechanism that would allow people to add new character sets like you would add a plugin so we could go and enable or disable them as we want just like plugins? If so this sounds like a great solution especially if we could make a new category on the add-ons marketplace for character sets. I think this would be even better since you wouldn't have to go through NV Access everytime you wanted to add a new character set and then it would be up to the original creator to keep maintaining the character set. It would also allow for far more character sets to be added.

LeonarddeR commented 3 hours ago

The categorization is up to NV Access/the add-on store, but otherwise, yes. I will try to come up with a prototype later this week.