Closed AAClause closed 5 years ago
Hi, have you contacted Liblouis folks about this possibility? Thanks.
From: André-Abush Clause [mailto:notifications@github.com] Sent: Wednesday, January 4, 2017 8:29 AM To: nvaccess/nvda nvda@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [nvaccess/nvda] Compiling Liblouis with 32-bit Unicode support (#6695)
In my opinion, the Liblouis DLL included in NVDA should be compiled with 32-bit Unicode. Currently, we improve the French table and we'd like to add any 32-bit Unicode such as \y1D4D0, \y1D49C, \y1D4D2, \y1D49E, etc. What do you think?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nvaccess/nvda/issues/6695 , or mute the thread https://github.com/notifications/unsubscribe-auth/AHgLkACqe_CV2gtBzv8GH_1rJZgTcW5Wks5rO8jZgaJpZM4Lazbp .
@josephsl yes, it is possible. See https://github.com/liblouis/liblouis/blob/master/README.windows
Edit the file configure.mk If you want 32-bit unicode change the 2 in the line UCS=2 to a 4.
CC @JcSTeh, @DKager, @LeonardDer
From: André-Abush Clause [mailto:notifications@github.com] Sent: Wednesday, January 4, 2017 8:43 AM To: nvaccess/nvda nvda@noreply.github.com Cc: Joseph Lee joseph.lee22590@gmail.com; Mention mention@noreply.github.com Subject: Re: [nvaccess/nvda] Compiling Liblouis with 32-bit Unicode support (#6695)
@josephsl https://github.com/josephsl yes, it is possible. See https://github.com/liblouis/liblouis/blob/master/README.windows
Edit the file configure.mk If you want 32-bit unicode change the 2 in the line UCS=2 to a 4.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nvaccess/nvda/issues/6695#issuecomment-270419286 , or mute the thread https://github.com/notifications/unsubscribe-auth/AHgLkN28Hz7C6gLCwbHTqmhFR70RvS61ks5rO8wQgaJpZM4Lazbp .
AFAIK that needs an update to the Python wrapper similar to what is done in Java. Another use case for this feature is support for emoji, e.g. U+1F170.
@Andre9642, how important is this for French? Does the table fail to work with NVDA if you include these characters or do they just get ignored? Are these characters frequently used in French?
@jcsteh:
Yes, the table fail to work with NVDA if I include these characters. Moreover Lou_checktable indicate that the table is invalid:
\<table>:\<line>: error: liblouis has not been compiled for 32-bit Unicode
No, these characters aren't frequently used in French. They are used in mathematics. In fact, our additions relate in part to Unicode Math Symbols. Extract:
uplow \x0391\x03b1 46-45-1,45-1 Α,α alpha uplow \x0392\x03b2 46-45-12,45-12 Β,β beta uplow \x0393\x03b3 46-45-1245,45-1245 Γ,γ gamma ... #sign \y1D4D2 46-5-14 # MATHEMATICAL BOLD SCRIPT CAPITAL C ... #sign \y1D49F 46-5-145 # MATHEMATICAL SCRIPT CAPITAL D ...
P2 because this prevents tables from even working in NVDA if they contain 32 bit characters.
Just a gentle ping to bring the attention of NVDA code contributors to this P2 issue in case anyone desires to work on this.
As noted in https://github.com/nvaccess/nvda/issues/6695#issuecomment-270511629 this needs some upstream work first. (In general, I'm not too fond of how liblouis deals with Unicode.)
Hi, we need this for Python 3 (#7105(. CC @JCSTeh
From: bhavyashah [mailto:notifications@github.com] Sent: Sunday, August 6, 2017 4:52 AM To: nvaccess/nvda nvda@noreply.github.com Cc: Joseph Lee joseph.lee22590@gmail.com; Mention mention@noreply.github.com Subject: Re: [nvaccess/nvda] Compiling Liblouis with 32-bit Unicode support (#6695)
Just a gentle ping to bring the attention of NVDA code contributors to this P2 issue in case anyone desires to work on this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nvaccess/nvda/issues/6695#issuecomment-320502162 , or mute the thread https://github.com/notifications/unsubscribe-auth/AHgLkDQJgjP_feXsQSQw-ySnBWT7NP21ks5sVajNgaJpZM4Lazbp .
@josephsl commented on 7 Aug 2017, 03:31 GMT+10:
Hi, we need this for Python 3 (#7105(.
No, we don't. Python 3 Unicode does handle 32 bit characters (actually, internally, i believe it's variable width now). However, Windows is still UTF-16 and thus ctypes still treats Unicode in C functions as UTF-16, since we're running on Windows. So, this is not required for Python 3.
For anyone that does want to try to implement this into the liblouis Python bindings, here are some rough thoughts:
ctypes.sizeof(ctypes.c_wchar)
is used, instead use the number of bits from 1) divided by 8 for bytes.inbuf.encode("UTF-32")
outbuf.value.decode("UTF-32")
Note that this proposed implementation eliminates the need to consider the ctypes unicode width altogether. However, it may be slightly less optimal when the ctypes unicode width does match the liblouis unicode width because it always uses a Python byte string as an intermediary, where perhaps ctypes is more efficient internally; I'm not sure. You could get around this by checking the ctypes unicode width with ctypes.sizeof(ctypes.c_wchar)
and only using the proposed new behaviour if it differs from the liblouis width. However, this will make the code somewhat more complicated (especially with regard to testing on different configurations) and is probably premature optimisation unless a significant speed difference can be proved.
There is also a function to get the widechar size, though that would require a library call from the wrapper:
/**
* Return the size of widechar
*/
LIBLOUIS_API
int EXPORT_CALL lou_charSize(void);
A much more ambitious solution would be to make liblouis work with UTF-16 as opposed to UCS-2, i.e. add support for surrogate characters.
I'm going to give this a try.
@jcsteh commented on 7 aug. 2017 01:04 CEST:
- The Python bindings need to know the number of bits used for widechar (16 or 32) by the compiled liblouis. That'll need to be an autotools macro expansion, just as is done for ###LIBLOUIS_SONAME###. (NVDA's build system will also need to handle this expansion as it does for ###LIBLOUIS_SONAME###.)
Shouldn't we get rid of autotools macro expansions in python bindings if possible?
Ideally, I guess so, but this piece of information is inherently tied to the specific build of liblouis. One way you could work around that is to have some liblouis function which returns this information, but I'm not sure whether that'd be acceptable to the maintainers. Given that we already have LIBLOUIS_SONAME and I really don't see how we could get rid of that, I think it'd be less friction to just go the ugly path and use another macro expansion, but I'm not the one implementing this. :)
@jcsteh commented on 30 aug. 2018 18:58 CEST:
One way you could work around that is to have some liblouis function which returns this information, but I'm not sure whether that'd be acceptable to the maintainers.
There is such a function as @dkager pointed out in https://github.com/nvaccess/nvda/issues/6695#issuecomment-320570322
@jcsteh commented on 7 aug. 2017 01:04 CEST:
- This determines the encoding to use when passing strings to and from liblouis. If it's 16, we must use UTF-16. If it's 32, we must use UTF-32.
For reference, we should explicitly use the little endian versions of utf-16 and utf-32.
- After obtaining unicode strings from liblouis, decode them using the encoding from 2) before returning them; e.g.
outbuf.value.decode("UTF-32")
Now, that fails, since for c_char_p, the value is truncated as soon as a null character is encountered. Even when we would use c_wchar_p on Windows, the value will be truncated for cases like 'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'
Just filed https://github.com/liblouis/liblouis/pull/633. I did not yet bother to create a try build due to several issues, including liblouis 3.7 refusing to build when building with NVDA's build system. Feedback is welcome though.
Does their own NMake-based build still work? That seems to break every other release too.
@jcsteh commented on 7 aug. 2017 00:25 CEST:
@josephsl commented on 7 Aug 2017, 03:31 GMT+10:
Hi, we need this for Python 3 (#7105(.
No, we don't.
Actually, I'm afraid @josephsl has a point.
>>> import louis
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "d:\Development\NVDA\source repo\source\louis\__init__.py", line 358, in <module>
raise Exception("mismatch in Unicode width: liblouis is 2 and python isn't")
Exception: mismatch in Unicode width: liblouis is 2 and python isn't
@leonardder commented on Sep 4, 2018, 6:44 PM GMT+10:
File "d:\Development\NVDA\source repo\source\louis\__init__.py", line 358, in <module> raise Exception("mismatch in Unicode width: liblouis is 2 and python isn't")
Sure, but that's because the bindings don't yet support both bit widths (regardless of Python bit width). Once that's fixed, this check can be removed from the bindings. That should be done as part of liblouis/liblouis#633.
That's already covered in https://github.com/liblouis/liblouis/pull/633 indeed.
Ugh, there's one major thing I didn't think about yet. When compiling liblouis with UCS-4, cursor position breaks in a major way. This is because NVDA provides the cursor position based on the internal python 2 unicode representation, which has two offsets for characters involving surrogates in UTF-16, such as 😊. A switch to Python 3 would automatically fix this I believe, since Python 3 unicode strings behave similar to 4 byte widechar strings (i.e. one 4 byte liblouis widechar corresponds with one offset in a python 3 string). We could work around this by working with an internal UTF-32 representation of the text, but that feels a bit hacky,
Hi,
As i know, this can break braille extender.
mailto:reply@reply.github.com @Andre9642/BrailleExtender
From: Leonard de Ruijter notifications@github.com Sent: Tuesday, December 4, 2018 9:10 AM To: nvaccess/nvda nvda@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [nvaccess/nvda] Compiling Liblouis with 32-bit Unicode support (#6695)
Ugh, there's one major thing I didn't think about yet. When compiling liblouis with UCS-4, cursor position breaks in a major way. This is because NVDA provides the cursor position based on the internal python 2 unicode representation, which has two offsets for characters involving surrogates in UTF-16, such as 😊. A switch to Python 3 would automatically fix this I believe, since Python 3 unicode strings behave similar to 4 byte widechars (i.e. one 4 byte liblouis widechar corresponds with one offset in a python 3 string). We could work around this by calculating the cursor position based on the UTF-32 representation of the text.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nvaccess/nvda/issues/6695#issuecomment-444008438 , or mute the thread https://github.com/notifications/unsubscribe-auth/AKohkyY67JlukHZAO2lJ6taFiKj04jGCks5u1i2_gaJpZM4Lazbp . https://github.com/notifications/beacon/AKohk0Ib7jj4AInJzLHYlFqrSpdU3TBpks5u1i2_gaJpZM4Lazbp.gif
I've found a way to do this anyway. A pull request will be filed shortly.
@leonardder Should this issue be closed now that #9544 has been merged?
Fixed in #9544
In my opinion, the Liblouis DLL included in NVDA should be compiled with 32-bit Unicode. Currently, we improve the French table and we'd like to add any 32-bit Unicode such as \y1D4D0, \y1D49C, \y1D4D2, \y1D49E, etc. What do you think?