rochus-keller / C2OBX

C header file to Oberon+ definition module transpiler
GNU General Public License v2.0
6 stars 0 forks source link

What is the purpose of -1 within an static quint32 array context for the function bool is_ident1(quint32 c). #1

Open bigopensky opened 1 year ago

bigopensky commented 1 year ago

Hi, I try to compile your code but get errors for the functionis_ident1(...) and is_ident2(...). I can see that -1 is some kind of end of marker in the range array used by in_rage(...) but -1 is not an quint32. I cannot oversee side effect but is 0x00000 an option?

Tokenizer.cpp: In function ‘bool is_ident1(quint32)’:
Tokenizer.cpp:196:5: error: narrowing conversion of ‘-1’ from ‘int’ to ‘quint32 {aka unsigned int}’ inside { } [-Wnarrowing]
     };

and

Tokenizer.cpp: In function ‘bool is_ident2(quint32)’:
Tokenizer.cpp:207:5: error: narrowing conversion of ‘-1’ from ‘int’ to ‘quint32 {aka unsigned int}’ inside { } [-Wnarrowing]
     };
rochus-keller commented 1 year ago

Looks like an "abbreviation" to say 0xffffffff. The code originally came from C where the compiler doesn't object such conversions. Neither my GCC 4.8 complains. What compiler/version do you use?

bigopensky commented 1 year ago

I use Thread model: posix; gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

bigopensky commented 1 year ago

Thank you for the fast answer. I made some changes (diff below). Now the tool compiles but core dumps when I try to translate the png.h header. Do I miss something?

./C2OBX /usr/include/png.h
C2OBX version: 2022-09-03 author: me@rochus-keller.ch  license: GPL
Segmentation fault (core dumped)

Installation steps are

# ENVIRONMENT
$ uname -a 
Linux zulu 5.4.0-144-generic #161~18.04.1-Ubuntu SMP Fri Feb 10 15:55:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ qmake -query QT_VERSION
5.9.5

$  g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 
# INSTALLATION IS:
$ SPATH=~/Source/IfGDV-Oberon && mkdir $SPATH && cd $SPATH
$ git clone https://github.com/rochus-keller/C2OBX.git
$ cd C2BOX
$ qmake C2OBX2.pro
$ make clean
$ make
g++ -c -pipe -O2 -Wall -W -D_REENTRANT -fPIC -DQT_NO_DEBUG -DQT_CORE_LIB -I. -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o C2OBX.o C2OBX.cpp
# ... getting a lot of warnings 

I replaced -1 by a constant const quint32 RANGE_EOF = 0xffffffff; and added #include <QByteArrayList>

~/Source/IfGDV-Oberon/C2OBX$ git diff | cat 
diff --git a/Tokenizer.cpp b/Tokenizer.cpp
index d28b629..8d731ac 100644
--- a/Tokenizer.cpp
+++ b/Tokenizer.cpp
@@ -161,7 +161,8 @@ static bool startswith(const char *p, const char *q) {
 }

 static bool in_range(quint32 *range, quint32 c) {
-    for (int i = 0; range[i] != -1; i += 2)
+
+   for (int i = 0; range[i] != RANGE_EOF; i += 2)
         if (range[i] <= c && c <= range[i + 1])
             return true;
     return false;
@@ -192,7 +193,7 @@ bool is_ident1(quint32 c) {
         0x10000, 0x1FFFD, 0x20000, 0x2FFFD, 0x30000, 0x3FFFD, 0x40000, 0x4FFFD,
         0x50000, 0x5FFFD, 0x60000, 0x6FFFD, 0x70000, 0x7FFFD, 0x80000, 0x8FFFD,
         0x90000, 0x9FFFD, 0xA0000, 0xAFFFD, 0xB0000, 0xBFFFD, 0xC0000, 0xCFFFD,
-        0xD0000, 0xDFFFD, 0xE0000, 0xEFFFD, -1,
+        0xD0000, 0xDFFFD, 0xE0000, 0xEFFFD, RANGE_EOF
     };

     return in_range(range, c);
@@ -203,7 +204,7 @@ bool is_ident1(quint32 c) {
 bool is_ident2(quint32 c) {
     static quint32 range[] = {
         '0', '9', '$', '$', 0x0300, 0x036F, 0x1DC0, 0x1DFF, 0x20D0, 0x20FF,
-        0xFE20, 0xFE2F, -1,
+        0xFE20, 0xFE2F, RANGE_EOF
     };

     return is_ident1(c) || in_range(range, c);
diff --git a/Tokenizer.h b/Tokenizer.h
index 87c00df..3119c46 100644
--- a/Tokenizer.h
+++ b/Tokenizer.h
@@ -21,6 +21,7 @@
 * http://www.gnu.org/copyleft/gpl.html.
 */

+#include <QByteArrayList>
 #include <QString>
 #include <QList>
 #include <QHash>
@@ -32,6 +33,8 @@ struct Token;
 struct Hideset;
 struct Type;

+const quint32 RANGE_EOF = 0xffffffff;
+  
 struct File
 {
     QByteArray name;
rochus-keller commented 1 year ago

Does it stil seg fault with your modifications?

rochus-keller commented 1 year ago

I just run it in the debugger with /usr/include/png.h but didn't manage to crash it yet, and still continue adding include paths and defines because the parser reports errors.

I guess it's much easier to set up a libpng build with a small compiler like TCC and then run C2OBX with the TCC includes and defines.

Here is e.g. the batch file I used to translate Nappgui (-f option of C2OBX):

/home/me/nappgui_src-main/nappgui/src/gui/guiall.h
-m 
nappgui_src-main/nappgui 
-DCMAKE_RELEASE 
-Dnappgui_src-main/nappgui_LIBRARY 
-Dnappgui_src-main/nappgui_BUILD_DIR 
-Dnappgui_src-main/nappgui_SOURCE_DIR 
-Dnappgui_src-main/nappgui_BUILD
-D__linux__
-D__i386__
-D__GNUC__=4
-D__GNUC_MINOR__=2
-I/home/me/Entwicklung/Modules/C2OBX/chibicc/clib 
-I/home/me/nappgui_src-main/nappgui/src/sewer 
-I/home/me/nappgui_src-main/nappgui/src/osbs
-I/home/me/nappgui_src-main/nappgui/src/draw2d 
-I/home/me/nappgui_src-main/nappgui/src/geom2d 
-I/home/me/nappgui_src-main/nappgui/src/gui 
-I/home/me/nappgui_src-main/nappgui/src/osgui 
-I/home/me/nappgui_src-main/nappgui/src/core 
-I/home/me/nappgui_src-main/nappgui/src/osapp
-I/home/me/nappgui_src-main/nappgui/src
-o 
/home/me/nappgui_src-main/nappgui.obx

As you can see I used the chibicc clib headers which makes things much easier.

EDIT: and here are the options I used to translate SDL2:

/home/me/sdl2/SDL.h -m SDL -p SDL_ -I/home/me/sdl2 -I/home/me/Entwicklung/Modules/C2OBX/chibicc/clib

bigopensky commented 1 year ago

Thank you, yes it crashes also with the modifications. I will investigate this later based on your advises and run the tool in qt-creator (debugger). At the moment I try to run the ObxIde with the current Ubuntu Mono-3 setup.

rochus-keller commented 1 year ago

Ok, now I tried to run it in release mode on a 64 bit linux system (GCC 10.2) and can confirm that there is a segfault; this doesn't happen in debug mode on this system (even if I run the release mode executable manually in GDB), nor in release mode on my 32 bit linux (GCC 4.8). I have no idea so far what is causing this; no warnings are issued; I will change some options and try to instrument and trace the code.

bigopensky commented 1 year ago

During my compile process (compile.log attached) I see the following warnings:

// Example
// Parser.cpp:1492:9: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
static bool is_typename(Token *tok) {
    static QSet<QByteArray> map;

    if (map.isEmpty()) {
        static char *kw[] = {
            "void", "_Bool", "char", "short", "int", "long", "struct", "union",
            "typedef", "enum", "static", "extern", "_Alignas", "signed", "unsigned",
            "const", "volatile", "auto", "register", "restrict", "__restrict",
            "__restrict__", "_Noreturn", "float", "double", "typeof", "inline",
            "_Thread_local", "__thread", "_Atomic",
        };
...

It could cause problems because it is a "implicit wrong expressed" memory operation. I guess the parser code is generated the code by Coco/R via a ATG production rule.

-  and different unreachable function ends (-Wreturn)

Parser.cpp: In function ‘C::Member* struct_designator(C::Token*, C::Token, C::Type*)’: Parser.cpp:1006:1: warning: control reaches end of non-void function [-Wreturn-type]

Parser.cpp: In function ‘quint64 read_buf(char*, int)’: Parser.cpp:1385:1: warning: control reaches end of non-void function [-Wreturn-type]

rochus-keller commented 1 year ago

Thanks for the data. I made a lot of fixes based on your log file to get rid of the (from my point of view) potentially relevant warnings. But I still had the exact same behaviour on x64.

Then I incrementally instrumented the code with qDebug() and finally was able to narrow down the crash to the line if (tok->hideset->hideset_contains(tok->txt)) in the expand_macro function in Preprocessor.cpp. Apparently tok->hideset can be null and for some reason I don't understand yet this only seems to happen on x64.

Anyway I now added a check to the line and it doesn't crash anymore. But this is likely not the end, since tok->hideset is again used downwards. So I have to find out now why it is null and fix this.

Maybe you want to try this version anyway and tell me whether you still see the segfault.

Update:

Meanwhile I think I found the reason and fixed it; the problem was that I didn't fully understand the code when I migrated it to C++ and interpreted functions of two hidesets prematurely as member functions without considering the case that the left argument can be null. It's rather surprising why it didn't crash on x86. I will check the code for other potential misinterpretations of this kind.

I didn't fully regression test the updated version yet, but maybe you can try it anyways.

Update 2:

Just did a regression test on x86 with SDL2 headers which worked.

rochus-keller commented 1 year ago

Can we close this?

bigopensky commented 1 year ago

Sorry for the late answer, I'm not sure because the segfault remains

mv CBOX CBOX.V1
git clone https://github.com/rochus-keller/C2OBX.git
cd C2OBX
qmake C2OBX2.pro 
make clean
make 
mkdir Test-PNG
cd Test-PNG/
../C2OBX -p TEST_PNG -I /usr/include/png.h
C2OBX version: 2023-06-27 author: me@rochus-keller.ch  license: GPL
Segmentation fault (core dumped)

cd ..
mkdir TEST_ERROR
cd TEST_ERROR
../C2OBX -p TEST_PNG -I /usr/include/error.h 
C2OBX version: 2023-06-27 author: me@rochus-keller.ch  license: GPL
Segmentation fault (core dumped)
bigopensky commented 1 year ago

When I run it on a pure Debian system I got proper execution results. With my Ubuntu 18.04 setup not ...

uname -a
Linux cumulus.bigopensky.de 4.19.0-20-amd64 #1 SMP Debian 4.19.235-1 (2022-03-17) x86_64 GNU/Linux

./C2OBX -p TEST_ -I  /usr/include/png.h 
C2OBX version: 2023-06-27 author: me@rochus-keller.ch  license: GPL
cannot open file: bits/libc-header-start.h

I have to investigate this further ..

rochus-keller commented 1 year ago

Ok, thanks for the feedback and the data.

My x64 test system is also Debian, which might be the reason why I don't see the crash; let me do some more experiments.