jart / cosmopolitan

build-once run-anywhere c library
ISC License
17.71k stars 605 forks source link

The printf family of function don't pad conversion specifications using wide characters properly #776

Closed GabrielRavier closed 4 days ago

GabrielRavier commented 1 year ago

The following program:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main()
{
    setlocale(LC_ALL, "");
    printf("<%3lc>\n", (wint_t)0x3C0);
}

outputs < π> (with a single space) on the 3 libcs I tested it on (when using an appropriate UTF-8 locale to ensure the conversion from wint_t would work properly), whereas cosmopolitan instead outputs < π> (with two spaces).

As the C standard indicates that field length is counted in bytes and not in multibyte characters, cosmopolitan's behavior is incorrect.

The standard makes this clear by including the following example:

EXAMPLE 2 In this example, multibyte characters do not have a state-dependent encoding, and the members of the extended character set that consist of more than one byte each consist of exactly two bytes, the first of which is denoted here by a □ and the second by an uppercase letter. Given the following wide string with length seven,

static wchar_t wstr[] = L"□X□Yabc□Z□W";

the seven calls

fprintf(stdout, "|1234567890123|\n");
fprintf(stdout, "|%13ls|\n", wstr);
fprintf(stdout, "|%-13.9ls|\n", wstr);
fprintf(stdout, "|%13.10ls|\n", wstr);
fprintf(stdout, "|%13.11ls|\n", wstr);
fprintf(stdout, "|%13.15ls|\n", &wstr[2]);
fprintf(stdout, "|%13lc|\n", (wint_t) wstr[5]);

will print the following seven lines:

|1234567890123|
|  □X□Yabc□Z□W|
|□X□Yabc□Z    |
|    □X□Yabc□Z|
|  □X□Yabc□Z□W|
|      abc□Z□W|
|           □Z|

If the field length was to be counted in multi-byte characters rather than in bytes, the output of the example given in the standard would be quite different.

GabrielRavier commented 1 year ago

PS: Running my example on an ASAN build of cosmopolitan also results in the following being output:

cosmopolitan: WARNING: ASAN bad 4 byte load at 7000007ffd14 bt 473a8b 46a1c7 467fe0

so there's probably also a problem there.

GabrielRavier commented 4 days ago

Looks like this is an intended divergence from the standard (well except for the ASAN error but that's not a thing in cosmopolitan anymore), so I'll close this.