jquast / wcwidth

Python library that measures the width of unicode strings rendered to a terminal
Other
392 stars 58 forks source link

wcwidth should have a "C Extension" #103

Open jquast opened 9 months ago

jquast commented 9 months ago

Previously discussed,

I'm open to any specific solution. Plenty were discussed in the past.

Why compile?

My only suggested requirement is that wcwidth install without error on minority operating systems/environments that can't build or fetch a matching pre-built package: that those systems should succeed to install anyway and continue to use the pure-python implementation.

I think using just the basic C language is a fine choice, our use of the language and build would be the most basic and supportable across all kinds of systems and I know C well enough so I don't mind that at all.

Python-like languages like Cython are also very "inclusive" for outside developers to dissect and contribute to, as they are very likely to be python developers, whereas using Rust or something to create a foreign function interface might be very alienating.

jquast commented 9 months ago

Related, I created issue #104 because the code for parsing UNICODE_VERSION to a matching table is a bit complicated, it aims to be very lenient but best-matching. I am confident in writing a "safe" C _bisearch, but it makes it difficult to write wcwidth() and wcswidth() in C because they call out to _wcmatch_version() which does a lot of work (and caches its results), and this would have to be re-implemented in C and its really just a big chore for very little gain!

I think if we make a C extension, we should also drop UNICODE_VERSION support.

SlySven commented 8 months ago

I was recently made aware of this wcwidth project and wanted to draw your attention to another related one: widecharwidth that provides a wcwidth.h header file that has been included in the Mudlet MUD Client that I contribute code to.

jquast commented 8 months ago

Hello @SlySven that's really great work!! It looks like we have very similar interests :)

I will take more time to look at widecharwidth soon, and file issues or conversations there, my brief findings just by browsing code,

Also that's great that you also contribute to Mudlet! I have been meaning to work on extending https://github.com/jquast/telnetlib3 to support the MUD protocol extensions and I have been using Mudlet as a reference client. I've had a hard time keeping maintenance of telnetlib3 but I hope to resume again soon

SlySven commented 8 months ago

Ah, I should point out that widecharwidth is not my project - I am one of the main coders of Mudlet though, and what we do is use Qt's QTextBoundaryFinder in QTextBoundaryFinder::Grapheme mode to identifier the first code-point in each grapheme and then feed just that to the (int) widechar_wcwidth(uint32_t) function defined in the widechar_width.h file that we import from @ridiculousfish 's project.

jquast commented 8 months ago

widecharwidth provides only "wcwidth", but does not provide a "wcswidth" function, so, it cannot support VS-16 or ZWJ. Probably not much of a problem for MUD's. Anyway I hope to continue my work on telnetlib3 and use mudlet again soon, thanks again for your work there.