Open jquast opened 11 months ago
Just out of curiosity, are there any real-world examples where this enhancement would be beneficial?
I have used to strike-through on the suggestion of the terminal sequence, I will save for another issue.
As for the need for "width" function, just about every downstream library has some issue with the POSIX wcwidth and wcswidth functions.
This is mainly because both functions may return -1, and the return value must be checked, but it often is not.
And I think all downstream users wish for us to have a single function that makes a "best effort". if a zero width joined emoji sequence also contains a newline or other control character, it is best to just return our best estimate of the measurement rather than -1 as wcswidth() does.
Although using wcswidth()
on string is the most popular use case, it has the possibility to return -1 by POSIX definition, and Markus Kuhn's 2007 implementation returns -1 for control characters, chr(1) through 32.
As a workaround, I have suggested to use wcwidth()
directly on each individual character and clip the possible -1 return value to 0, example: https://github.com/jquast/blessed/blob/a34c6b1869b4dd467c6d1ab6895872bb72db7e0f/blessed/sequences.py#L364
This provides the same function as wcswidth but provides a "best guess", however, this method cannot handle coming changes to wcswidth to handle zero width joiner (ZWJ) sequences.
Although I am open to changing wcswidth() to never return -1 and make a "best effort", it would deviate from the original 2007 implementation and POSIX specification, and this is why i suggest an entirely new function name and strongly suggest it is the best alternative in the docstrings of wcswidth and wcwidth
Thank you for the clarification!
I have created it in development branch but I will make a bugfix release first, I will make a PR for this next, https://github.com/jquast/wcwidth/blob/1f1443b7af38b9e1b36a895b5d998f511021d377/wcwidth/wcwidth.py#L262-L277
I have revised this description and related issue #92
And I do think they are closely related. control characters like \b
is just as much a terminal sequences as \x1b[0m
. Ignoring the '\x1b' is not enough, I think we should measure the full sequence \x1b[0m
as 0 instead of 3 (char lengths 0, 1, 1, 1). And provide a choice for ambigous characters like '\b' and or '\x1b[D' as either -1 (moving backwards, 'parse') or 0 (ignored)
Problem
As for the need for "width" function, just about every downstream library has some issue with the POSIX
wcwidth()
andwcswidth()
functions, either in C or in this python library.This is mainly because both functions may return -1, and the return value must be checked, but it often is not.
Although using
wcswidth()
on a string is the most popular use case, it has the possibility to return -1 by POSIX definition, and Markus Kuhn's 2007 implementation returns -1 for control characters.The return value is often unchecked where it is used with sum(), slice() or screen positioning functions with surprising results.
Solution
Provide new function signature,
width
that always returns a "best effort" of measured distance. It may ignore or measure control codes, instead. If "catching unexpected control codes" is a desired function, we can continue to provide it as an optional keyword argument, and, rather than return -1, raise an exception.Maybe new keyword argument
control_codes
with default argument 'ignore', in similar spirit to 'errors' for https://docs.python.org/3/library/stdtypes.html#bytes.decode,Workaround
As a workaround, I have suggested to use
wcwidth()
directly on each individual character and clip the possible -1 return value to 0, example: https://github.com/jquast/blessed/blob/a34c6b1869b4dd467c6d1ab6895872bb72db7e0f/blessed/sequences.py#L364This provides the same function as wcswidth but provides a "best guess", however, this method cannot handle coming changes to wcswidth to handle zero width joiner (ZWJ) sequences.