liblouis / liblouis

Open-source braille translator and back-translator.
http://liblouis.io
GNU Lesser General Public License v2.1
259 stars 209 forks source link

[Feature Request] Allow optionally enabled rules #664

Closed school510587 closed 1 year ago

school510587 commented 5 years ago

According to the discussion among users from Taiwan, I believe frequently used symbols such as emoticons will be defined by zh-tw.ctb (and every ctb table) soon. Especially after NVDA updating its liblouis.dll to UCS4 compatible version, these definitions will be desired by massive users. However, consider a proposal of defining emoticon braille patterns: https://www.cbtbc.org/braille/emoticons/ It suggests the entire name of an emoticon be present on the braille display, which may be very long. This will cause bad experience for those using 20-cell (or less) braille displays. However, with a 40-cell braille display, a user still hopes to determine what a symbol is. I think this issue happens universally in every braille table. To satisfy both requests with one ctb table, could liblouis include optionally enabled rules?

egli commented 5 years ago

Optionally enabled based on what?

Wouldn't a workable and simple solution be to just include the default emoticon definition in the table and have an additional table that defines "short emoticons" (whatever that may be), that overwrites the standard ones. Then you can when invoking lou_translate just pass the normal table or for short emoticons combine the short table with the normal table.

bertfrees commented 5 years ago

Wouldn't a workable and simple solution be to just include the default emoticon definition in the table and have an additional table that defines "short emoticons"

This is exactly what I am thinking. The minimum width of the braille display should be a metadata field, and NVDA should use this to select the right table. It's a bit like how CSS media queries work: https://www.w3.org/TR/css3-mediaqueries/#device-width.

school510587 commented 5 years ago

Hi @egli and @bertfrees,

Thanks for the reply. First at all, I'm sure I don't ask for a new typeform. Are you mentioning a universal (language-independent) table for short braille representations of emoticons? I have no idea if there is more simple way than the default pattern like '\y12345'. However, assume that I hit your meaning. I think it will be a good way to pass "zh-tw.ctb,short-emoticons.ctb" to overwrite long definitions (written in bopomofo braille) of emoticons defined in zh-tw.ctb. For wide enough braille displays, lou_translate then receives only "zh-tw.ctb" to apply the normal definitions.

bertfrees commented 5 years ago

I was not thinking about a universal table. It could indeed be implemented as an "include" table with common emoticon definitions, but every individual table would decide whether it provides a variant that includes the emoticons table or not.

school510587 commented 5 years ago

Hi @bertfrees,

Please let me guess again. Assume long definitions of emoticons are written in another table named zh-tw-emoji.ctb, and its locale is set to cmn-tw. In addition, zh-tw-emoji.ctb has a metadata field to limit the minimum number of cells of the braille display. The application, say NVDA, then performs a search with locale=cmn-tw and ncells=20 (or 40 if a 40-cell braille display is used). liblouis will definitely return zh-tw.ctb, but the number of braille display cells will determine whether zh-tw-emoji.ctb is included in the result. Thus, did this match your previous statements? Thanks!

bertfrees commented 5 years ago

Well kind of. The table discovery mechanism selects only a single, top-level table, and the top-level table decides which sub-tables to include.

So there would be a "zh-tw" table with locale:cmn-TW and, and a "zh-tw-incl-emoji" table which would be identical but would include the emoji definitions and would have an additional metadata field min-ncells:40.

This table discovery mechanism is of course quite rudimentary at the moment. There needs to be one top-level table for each variant of the braille code. The number of tables grows exponentially with the number of optional features. There are ways to improve it, but it is important that the table itself is in control of which sub-tables are included and in which order.

school510587 commented 5 years ago

Hi @bertfrees,

Ok, I think I have got it, thanks. I can write an additional table with min-ncells:40, and this table includes zh-tw.ctb first, and definitions of emoticons follows. Is min-ncells:40 a valid field now? I didn't find it in liblouis.texi.

bertfrees commented 5 years ago

No, it's the first time we're doing something like this. But let's call it min-device-width. That is less cryptic.

school510587 commented 5 years ago

Hi @bertfrees,

I think min-device-width is quite OK. Will it be milestone of the next release? Thanks.

bertfrees commented 5 years ago

There is nothing that needs to be done in the core of Liblouis. You just need to create the tables, and NVDA will need to select the right table based on the properties of the braille display used.

bertfrees commented 5 years ago

There is nothing that needs to be done in the core of Liblouis.

Actually, there needs to be done something in Liblouis to make a query such as device-width:50 match a table with a field min-device-width: 40. This will not magically work at the moment. @school510587 If you are planning to actually create and commit a table with this kind of metadata, like e.g. the emoticons table, then I'm willing to implement it.

school510587 commented 5 years ago

Hi @bertfrees,

Sorry for no reply for a long time. In fact, NVDA doesn't update their liblouis to UCS4 compatible version, so I can't decide when to implement the emoticon table as described above. Do you think this as a good practice? always \xd83d\xdffb dots... This works with UCS2 version of liblouis, but liblouis will require \xd83d and \xdffb be defined previously, which seems odd... because the two code points are not valid characters.

bertfrees commented 5 years ago

I would like to change Liblouis so that it does not require these components to be previously defined.

However it still seems wrong to decompose something that is a single character into two "\x..." characters.

school510587 commented 5 years ago

Hi @bertfrees,

There is another problem. According to the previous discussion, I must create a new file, say zh-tw-emoticon.ctb, containing emoticon patterns and "include zh-tw.ctb". But, NVDA does not know zh-tw-emoticon.ctb, so we cannot test this new table during usual computer usage with a screen reader and a browser. Any comment? Thanks! Actually, I think the idea of conditional "include" does not have this problem.

bertfrees commented 5 years ago

I don't know how to best test Liblouis tables in NVDA. Is there no way to add a new table? Maybe you can just replace an existing table?

school510587 commented 5 years ago

Hi @bertfrees,

Maybe you can just replace an existing table? Thanks for the reply. It works during test of usual modifications. That is, I ask every reviewer to replace zh-tw.ctb with the latest version which will soon be merged into liblouis. However, it doesn't work in this case. Because "include zh-tw.ctb" is presented in zh-tw-emoticon.ctb. I may put emoticon definitions into zh-tw.ctb to simplify the test procedure, but I somehow think its logic is odd.

bertfrees commented 5 years ago

Yeah, sure, if it not possible to add tables to NVDA, you need to put the definitions in the table itself for now. It would be nice if NVDA could make it easier for users to use custom tables though.

school510587 commented 5 years ago

Hi @bertfrees,

OK, thanks. I will ask for more suggestions from other reviewers recently. If no problem, I will add emoticons into zh-tw.ctb (in the "always" form), but they won't be included in the next PR of zh-tw.ctb update.

school510587 commented 5 years ago

Hi @bertfrees,

I would like to change Liblouis so that it does not require these components to be previously defined.

Is it possible to include this change into 3.10.0 release? If I put emoticons into zh-tw.ctb, I need this feature to avoid a ridiculous number of additional lines. Thanks.

bertfrees commented 5 years ago

I'm working on it but I don't know if it is going to be ready for 3.10. I'll do my best.

bertfrees commented 5 years ago

I'm working on it but I don't know if it is going to be ready for 3.10. I'll do my best.

See https://github.com/liblouis/liblouis/issues/332

bertfrees commented 5 years ago

Is it possible to include this change into 3.10.0 release? If I put emoticons into zh-tw.ctb, I need this feature to avoid a ridiculous number of additional lines.

@school510587 The change was done in Liblouis 3.10. Let me know if it works out for you.

school510587 commented 5 years ago

Hi @bertfrees,

Ok, I'm sure every character of "correct" and "context" rules is not necessary to be pre-defined now. However, I only try the noback case, because I am not familiar to back translation. But, another problem arises. To define 😊 (\y1F60A), I write: noback context "\xD83D\xDE0A" @234-134-24-123-15 The UCS-4 version of liblouis doesn't work, that is, it shows '\y1F60A' to me. Because NVDA still uses UCS-2 version, \y and \z are rejected, which makes me to use UCS-2 representation. I don't know how to bridge the gap of the two versions, unless I write two different versions of zh-tw.ctb for UCS-2 and UCS-4 respectively. Is there better way to deal with \xD83D\xDE0A/\y1F60A conflict? Thanks!

bertfrees commented 5 years ago

Two different versions is maybe not such a bad option if you can generate one from the other.

The alternative would be that Liblouis, when compiling a table in UCS2 mode, would convert characters in the U+010000 to U+10FFFF range to surrogate pairs. However a limitation is that this will only work for translation rules ("always", "context", etc.), not for character definition rules ("letter", "sign", etc.). And a second issue is that it might be tricky to guarantee that tables behave exactly the same in UCS2 and UCS4 mode. But I think it is doable.

EDIT: It might be possible to support character definition rules after all. But it makes it slightly more complex.

school510587 commented 5 years ago

Hi @bertfrees,

If there will be two versions of zh-tw.ctb: Current rules in zh-tw.ctb must be migrated into a common file for inclusion, resulting in three files. Then, what are better names for them, i.e. a common inclusion table, a table containing surrogate pairs, and a table containing \y/\z definitions? There is also zh_TW.tbl. I don't know what change will be required in it. However, I somehow feel that it is biased if only one version of zh-tw.ctb is included. Thanks.

bertfrees commented 5 years ago

I don't really care about the file names, as long as the metadata is complete. One table should indicate in its metadata that it is intended to be used with a UCS4 version of Liblouis.

For example, a good name for the metadata field could be "bitness", and the possible values "16" and "32" (the former being the default).

Also related is https://github.com/liblouis/liblouis/issues/734, which proposes to put UCS4 tables in a separate folder.

school510587 commented 5 years ago

Hi @bertfrees,

Here is some future plans of this issue: I just receive this message that NVDA will update liblouis to UCS-4 version in 2019.3 release: https://github.com/nvaccess/nvda/pull/9544 After updating liblouis to 3.10.0, I'll ask Taiwanese NVDA users/reviewers to try new version of zh-tw.ctb with emoticons. At that time, I'll use either UCS-2 or UCS-4 syntax to implement it, depending on how NVDA compiles liblouis.dll, and do the following changes in a future PR:

  1. Rename current zh-tw.ctb to, for example, zh-tw-chardefs.cti.
  2. Submit new zh-tw.ctb with emoticon patterns in UCS-4 syntax and including zh-tw-chardefs.cti. Additionally, new zh-tw.ctb will have metadata "bitness: 32" and min-device-width (the final value is up to distribution of pattern length). Please note that I won't submit emoticon patterns in surrogate-pair syntax. I think UCS-2 version of liblouis should be lightweight with relatively few features. However, submission will be possible if I receive request from other users. Theoretically, NVDA 2019.3 will be released after my next PR, so these modifications may be submitted by my PR in November. Thank you and @egli for all efforts to provide a suitable environment so that zh-tw.ctb can fulfill emoticon support.
bertfrees commented 5 years ago

Great.

DrSooom commented 5 years ago

Please see: nvaccess/nvda#3304, nvaccess/nvda#8702, nvaccess/nvda#9213, nvaccess/nvda#9973, nvaccess/nvda#9982, #688, #689 and https://danielmayr.at/huc/

As Liblouis is also used by embosser software – and not only by screen readers with a connected braille display – I strongly recommend to change the meta tag "min-ncells" to "signnamegrade", which could have eight fixed values – 0 to 7. The end application should have the option to set the detail level of naming for a Unicode character. I don't want that Unicode characters are displayed in a different way automatically due to the length of a braille display. Furthermore I want to read as less cells as possible to recognize a Unicode character. Therefore I invented the HUC Braille Tables at the beginning of 2019.

Here are some examples for the eight signnamegrade values:

Personally I absolutely dislike the idea to replace single Unicode characters with a fixed name in braille automatically due to a table definition. Such replacements MUST be done by the end application itself – before the translation process into braille begins. It would be extremely horrible if "€" and "Euro" are both displayed as ⡑⠥⠗⠕ on a braille display or on paper. That would be a discrimination against the haptic reader in compare to the visual reader, who would still be able to recognize that "€" and "Euro" are different characters with different meanings. Furthermore the haptic reader is no longer able to recognize if "€" and "Euro" is written in the document. So this new produced problem will end in a "€"-"Euro"-mishmash within the same document.

TL;DR: The haptic reader MUST be able to read every single Unicode character exactly the same way as a visual reader does – even if this means that he have to memorize a few hexadecimal values.

school510587 commented 5 years ago

Hi @DrSooom,

In principle, I think that min-device-width is only a "suggestion" to applications. That is, one application may query tables without limitation of device width, and this table will satisfy the query. The "signnamegrade" in the comment can be implemented by either liblouis API or applications. Even if liblouis doesn't provide such API, one can call forward translation with a table list (tableList) consisting of a main translation table followed by a HUC table. Thus, dot patterns of characters not defined by the main table should be determined by the HUC table. To sum up, the difference between two new metadata fields bitness and min-device-width is that the former is mandatory, but the latter is selective. Hopefully, applications may give their users the freedom to adopt selective items during table discovery.

bertfrees commented 5 years ago

The idea of the min-device-width field was introduced as a way for the table itself to control in which situations it is applicable. In addition we could possibly have some other metadata in the tables to give some more control to the application (or user). But I don't want this other metadata to replace the min-device-width field. As Sponge says, application are free to ignore the min-device-width field.

While in principle your idea of the "signnamegrade" field could work, the 8 values sounds like overkill to me. I can't imagine that there will be tables that will have 8 different versions.

Maybe we should just have a single field to indicate whether special characters like the euro symbol are expanded (e.g. to the word "Euro") or not. All the rest, i.e. rendering undefined characters in the various ways, could be done with pre- and/or post-processing, possibly in combination with some new modes. Note that tables could also just not do these kind of expansions, but the problem is that some braille codes include such rules, so I think it should be the default behavior of those tables.

school510587 commented 4 years ago

Hi @bertfrees and @egli,

I have proposed a request to review the representation of emoticons in zh-tw.ctb, but a problem raises now:

Because most users are not familiar to English, it's better to translate emoticons based on Chinese. Here, CLDR is a good choice. Basically, CLDR provides a set of official names in Chinese for all emoticons. However, some characters are pronounced the same in Chinese, some emoticons thus have the same braille pattern using bopomofo-based zh-tw.ctb. Although it is a possible solution that we discuss and determine different Chinese names for these emoticons, some users may feel bad due to inconsistency between braille and speech.

Some reviewer suggests that there should be switch giving the user the entire freedom to choose the braille representation. For example, if this switch is OFF, all characters are displayed using their Unicode value. Previously developed HUC tables by @DrSooom may solve this problem, but I don't know when this feature will be added into NVDA.

Therefore, we conclude to delay the schedule of submission of zh-tw.ctb containing definitions for braille patterns of emoticons, because too early submission will force all users to accept ambiguous braille patterns for emoticons.

However, any opinion is welcome. Thanks!

egli commented 4 years ago

Hi @school510587 I agree that we should not hastily add such a far reaching new feature to liblouis. I would hesitate to add anything before we have a good understanding of the problem space.

You can always add a feature but it is very hard to remove something once it is established.

bertfrees commented 1 year ago

Closing this issue as I think this can and should be solved with table metadata (as explained above).