Open typesanitizer opened 6 years ago
What is the reasoning behind the current behaviour?
I may be able to answer this for you. From https://github.com/jquast/wcwidth/issues/54#issuecomment-1858569488
I just want to also add that this cannot be fixed in the wcwidth() and wcswidth() functions, as they intend to exactly match function signature and behavior of the POSIX functions.
The reason that C0 and C1 control characters return -1, is that the intended application, a terminal emulator especially, should handle these characters in a stream and remove them from the string before passing on to wcswidth. Especially items like \n, \b, and \t. They become complicated, it depends on the current position of the cursor, and also terminal settings, for example \b can wrap to previous row if it is located at column 0, and the number of spaces incurred by '\t' are dependent on the tab stop setting and the current cursor position. C1 characters like '\x1b' may begin a terminal escape sequence, and that too should be processed before sending to wcswidth, etc.
From the docs:
(Ignore '\0' for the points below as it has special treatment.)
This seems inconsistent with the behaviour for individual
char
s, whereNone
is returned in case you have a control character. For consistency, I would expect (A) for a string, if any character has a width ofNone
, the result should have widthNone
XOR (B) control characters always have widthSome(0)
.IIUC, the second option hasn't been taken for consistency with
wcwidth
, which returns-1
for control characters. However, not taking the first option can lead to non-intuitive behaviour that can go by unnoticed. E.g. if the code has LF/TAB/DEL in it, then you can get an answer that doesn't make much sense.Moreover, this violates an embedding law that one might expect to hold:
width(format!("{}", c)) == width(c)
(because it doesn't even type-check).What is the reasoning behind the current behaviour?
P.S. I'm not asking for the library's behaviour to be changed. I'm writing a Haskell implementation and ran into this while looking at the test cases. My library follows (A) because it seemed like the right choice, so I wanted to know why you didn't pick (A).