Closed mulle-nat closed 1 year ago
Thanks for reporting the issue. Could you please provide more details of your environment (locale, operating system, libc)? Please also make sure that your locale charmap is actually UTF-8 (on Linux, you can usually check that via locale charmap
command). Kefir uses system locale when preprocessing and lexing input files. For instance:
$ cat 1.c
//
// mulle_c11.h
//
// Copyright © 2016 Mulle kybernetiK. All rights reserved.
// Copyright © 2016 Codeon GmbH. All rights reserved.
//
int a;
$ LC_ALL=C locale charmap
ANSI_X3.4-1968
$ LC_ALL=C kefir --target x86_64-host-none -E 1.c
$ LC_ALL=C.UTF-8 locale charmap
UTF-8
$ LC_ALL=C.UTF-8 kefir --target x86_64-host-none -E 1.c
int a;
Sure. Here's my OS environment:
$ locale charmap
UTF-8
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy
I set the environment variables of kefir, like I saw them defined in ubuntu-misc.yml, which worked fine until I hit the header. I don't think the libc used comes into play, since I am just preprocessing and not including anything.
Seems like I was able to reproduce the issue, and it might be caused by missing locale definitions for user-preferred locale. The underlying problem can be reproduced by this code snippet:
#include <uchar.h>
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
const char string[] = "\xc2\xa9";
int main(int argc, const char **argv) {
setlocale(LC_ALL, "");
printf("%s\n", string);
mbstate_t mbstate = {0};
char32_t chr = U'\0';
size_t rc = mbrtoc32(&chr, string, sizeof(string), &mbstate);
printf("%d %u\n", (int) rc, chr);
return EXIT_SUCCESS;
}
Which outputs following:
root@29eba25fedbc:/# locale -a
C
C.utf8
POSIX
root@29eba25fedbc:/# gcc -o test test.c && LC_ALL=en_US.UTF-8 ./test
©
-1 0
root@29eba25fedbc:/# locale-gen en_US.UTF-8
Generating locales (this might take a while)...
en_US.UTF-8... done
Generation complete.
root@29eba25fedbc:/# locale -a
C
C.utf8
POSIX
en_US.utf8
root@29eba25fedbc:/# gcc -o test test.c && LC_ALL=en_US.UTF-8 ./test
©
2 169
Kefir relies on mbrtoc32
function for decoding, and glibc seems to use system locale definitions to implement that function. Can you check your current locale (locale
command) and make sure that it's actually available on the system (locale -a
)?
Interesting. So today after a fresh reboot I did this:
$ cat > x.c
#include <uchar.h>
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
const char string[] = "\xc2\xa9";
int main(int argc, const char **argv) {
setlocale(LC_ALL, "");
printf("%s\n", string);
mbstate_t mbstate = {0};
char32_t chr = U'\0';
size_t rc = mbrtoc32(&chr, string, sizeof(string), &mbstate);
printf("%d %u\n", (int) rc, chr);
return EXIT_SUCCESS;
}
$ cc -o x x.c
$ ./x
©
2 169
$ locale -a
C
C.utf8
de_DE.utf8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
POSIX
$ cat /home/src/srcO/mulle-cc/kefir/env
export KEFIR_RTLIB="/home/src/srcO/mulle-cc/kefir/bin/libs/libkefirrt.a"
export KEFIR_RTINC="/home/src/srcO/mulle-cc/kefir/headers/kefir/runtime"
export KEFIR_GNU_INCLUDE="/usr/lib/gcc/x86_64-linux-gnu/11/include;/usr/include/x86_64-linux-gnu;/usr/include;/usr/local/include"
export KEFIR_GNU_LIB="/usr/lib/x86_64-linux-gnu;/usr/lib/gcc/x86_64-linux-gnu/11/;/usr/lib;/usr/local/lib"
export KEFIR_GNU_DYNAMIC_LINKER="/lib64/ld-linux-x86-64.so.2"
export KEFIRCC=/home/src/srcO/mulle-cc/kefir/bin/kefir
export CC="${KEFIRCC}"
$ . /home/src/srcO/mulle-cc/kefir/env
$ cat > y.c
//
// mulle_c11.h
//
// Copyright © 2016 Mulle kybernetiK. All rights reserved.
// Copyright © 2016 Codeon GmbH. All rights reserved.
//
int a;
$ ${CC} -E y.c
int a;
Which indicates to me, that for some reason the locale of my last session must have been corrupted (in multiple terminals even), though I didn't do that intentionally nor would I know what may have caused this. In other words I can't reproduce this.
Nevertheless the silent failure of the compiler was unfortunate and the problem part, was really hard to track down.
Agreed, silent failure is unhelpful. I've pushed some fixes to produce an error when input decoding fails.
I wanted to try kefir on my projects, but I got a really strange error early on. It turns out that UTF8 in comments breaks things silently:
kefir vs gcc: