samtupy / nvgt

The Nonvisual Gaming Toolkit
https://nvgt.gg
Other
51 stars 31 forks source link

Audio Form Text encode issue #71

Closed harrymkt closed 3 months ago

harrymkt commented 3 months ago

Hello

Audio form is not displaying UTF8 texts correctly, for example,

မင်္ဂလာပါ

samtupy commented 3 months ago

Hi, just checking on this before I rename this issue. I'm pretty sure this has nothing to do with text encoding and everything to do with screen reader pronunciations, feel free to prove me wrong with a test script. But actually what's happening is that the audio form doesn't contain translations of every unicode character, thus if you are using IBMTTS with NVDA punctuation level some, various symbols won't read as you arrow over them. Similarly related is how the string "blank" that is spoken in empty text fields is spoken in English even on a German's computer because we don't have translated strings with the word blank for all languages. In this case I pasted the string you sent here in a multiline text field and set my punctuation level to all as well as switching to espeak, and indeed the text looked as it should sans these pronunciation issues which are much harder to fix because as I said it will actually require including a huge table in the audio form with English strings of all unicode characters and then oh no it only works in English. I'm honestly not sure how we can easily fix this because NVDA does not present a way through it's controler client to tell it to speak in character mode. If we could, this problem wouldn't exist.

harrymkt commented 3 months ago

Hello @samtupy

No, I'm not using IBM.

This is burmese language, and I have a TTS which reads burmese texts.

It is sure that if we copy back the input and pasted back in other area, it is working as expected, however in the form input it does not read, even full texts.

Another notice is that this is not happening on Survive The Wild form inputs.

Thanks!

samtupy commented 3 months ago

Hi, Just confirming, are you using nvgt 0.87.2? I only ask because I believe there was one release a week ago that very temporarily broke UTF8 screen reader output, but a patch was pushed shortly there after. Sorry if you already had thought of that, just want to make sure that we don't spend ages investigating something that is as simple as a reinstall! :) Beyond that I'll keep playing with it. One question, does the windows emoji panel work for you? For example I can add the emoticons "😀🤣😹😻😼🐱🐁🐛🦟🦗" to the formtest.nvgt script in test/quick in the repository, and that seems to work for me. Can you please confirm that this works for you as well, and if it does, can you please send over some additional broken strings I can use to test with? I'm trying to use yours with espeak and it seems to mostly work? Accept even in the browser where I'm copying the text from sometimes triple tapping numpad+2 on some of those characters is not making NVDA give a character code as normal, and NVDA is speaking 2 letters in one such character, in the browser it looks like 4 characters but like 8 letters. Audio form makes them appear as individual letters, so mayb e that's me seeing a version of the problem? Though I'm not sure how equipped my localization settings are to test this, I'll see what I can find especially once you confirm that this is not an NVGT version issue. Thanks!

harrymkt commented 3 months ago

Hello @samtupy.

I was believed to have installed 0.87.2 version, but in reality is installed 0.87.1 version.

After redownloading and installing the latest version, the UTF8 texts are now working properly.

Thanks!