ldthomas / apg-7.0

APG - ABNF Parser Generator. C version with many new features. Partially-Predictive Parsing Tables increases parsing speed. Phrase-Matching Engine with more features and more power than regex. Lots of cool utilities and many examples of use.
Other
15 stars 4 forks source link

Use of uninitialized variable detected by clang, also on ARM64 char is unsigned #1

Closed mingodad closed 3 years ago

mingodad commented 3 years ago

When compiling with clang11 as shown bellow:

clang -Wall -g -DAPG_AST -o apg *.c ../api/*.c ../library/*.c ../utilities/*.c
../api/api.c:365:61: warning: variable 'uipBeg' is uninitialized when used here [-Wuninitialized]
                for (uc = 0; uc < spOp->uiChildCount; uc++, uipBeg++) {
                                                            ^~~~~~
../api/api.c:349:17: note: initialize the variable 'uipBeg' to silence this warning
    aint *uipBeg;
                ^
                 = NULL
1 warning generated.

On arm64 with termux (https://termux.com/) there is a problem using a char to detect EOF in apg/config.c:

static void vExtractFileOptions(config_ctx* spCtx){
    void* vpVec = spCtx->vpVecArgs;
    char cZero = 0;
    char c; //!!!! on ARM and even other systems it's better to use an `int` here
    aint uiNewLine = 1;
...
    vpVecPushn(vpVec, (void*)cpFirst, uiSize);
    while((c = fgetc(spCtx->spConfigFile)) != EOF){ //!!!!! on ARM char is unsigned
        if(uiState == uiOption){
mingodad commented 3 years ago

Also testing with valgrind we get this problems:

valgrind --track-origins=yes ./apg -i abnf-for-sabnf.abnf -o dad 
==2005== Memcheck, a memory error detector
==2005== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2005== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2005== Command: ./apg -i abnf-for-sabnf.abnf -o dad
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118657: uiLastChar (output.c:786)
==2005==    by 0x11887A: bGetFileName (output.c:850)
==2005==    by 0x114B09: vApiOutput (output.c:185)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118657: uiLastChar (output.c:786)
==2005==    by 0x118897: bGetFileName (output.c:853)
==2005==    by 0x114B09: vApiOutput (output.c:185)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118657: uiLastChar (output.c:786)
==2005==    by 0x1188C3: bGetFileName (output.c:860)
==2005==    by 0x114B09: vApiOutput (output.c:185)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x1188E8: bGetFileName (output.c:862)
==2005==    by 0x114B09: vApiOutput (output.c:185)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118657: uiLastChar (output.c:786)
==2005==    by 0x118707: bSetFileExtension (output.c:808)
==2005==    by 0x114B88: vApiOutput (output.c:191)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x11872C: bSetFileExtension (output.c:810)
==2005==    by 0x114B88: vApiOutput (output.c:191)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118A32: bNameToCaps (output.c:909)
==2005==    by 0x1151A9: vOutputHeader (output.c:270)
==2005==    by 0x114C85: vApiOutput (output.c:200)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118A22: bNameToCaps (output.c:908)
==2005==    by 0x1153A4: vOutputHeader (output.c:290)
==2005==    by 0x114C85: vApiOutput (output.c:200)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118A32: bNameToCaps (output.c:909)
==2005==    by 0x115433: vOutputHeader (output.c:293)
==2005==    by 0x114C85: vApiOutput (output.c:200)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118BBC: bNameToCamelCase (output.c:941)
==2005==    by 0x11568D: vOutputHeader (output.c:315)
==2005==    by 0x114C85: vApiOutput (output.c:200)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118BBC: bNameToCamelCase (output.c:941)
==2005==    by 0x115834: vOutputHeader (output.c:330)
==2005==    by 0x114C85: vApiOutput (output.c:200)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118A22: bNameToCaps (output.c:908)
==2005==    by 0x11592D: vOutputHeader (output.c:336)
==2005==    by 0x114C85: vApiOutput (output.c:200)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118657: uiLastChar (output.c:786)
==2005==    by 0x118707: bSetFileExtension (output.c:808)
==2005==    by 0x114CB6: vApiOutput (output.c:203)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x11872C: bSetFileExtension (output.c:810)
==2005==    by 0x114CB6: vApiOutput (output.c:203)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== Conditional jump or move depends on uninitialised value(s)
==2005==    at 0x4C35108: strlen (vg_replace_strmem.c:459)
==2005==    by 0x118BBC: bNameToCamelCase (output.c:941)
==2005==    by 0x1164AD: vOutputSource (output.c:458)
==2005==    by 0x114DB3: vApiOutput (output.c:212)
==2005==    by 0x10D6D6: main (main.c:195)
==2005==  Uninitialised value was created by a heap allocation
==2005==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==2005==    by 0x1242BC: vpMemAlloc (memory.c:203)
==2005==    by 0x129055: vpVecCtor (vector.c:137)
==2005==    by 0x109643: vpConfigCtor (config.c:125)
==2005==    by 0x10D187: main (main.c:95)
==2005== 
==2005== 
==2005== HEAP SUMMARY:
==2005==     in use at exit: 0 bytes in 0 blocks
==2005==   total heap usage: 1,636 allocs, 1,636 frees, 6,046,153 bytes allocated
==2005== 
==2005== All heap blocks were freed -- no leaks are possible
==2005== 
==2005== For lists of detected and suppressed errors, rerun with: -s
mingodad commented 3 years ago

Looking at api/output.c shown bellow, there is any reason to be calling strlen(cpString) on every iteration of the loop ?

static aint uiLastChar(char cCharToFind, const char* cpString) {
    aint uiLast = APG_UNDEFINED;
    aint ui = 0;
    for (; ui < (aint) strlen(cpString); ui++) {
        if (cpString[ui] == cCharToFind) {
            uiLast = ui;
        }
    }
    return uiLast;
}
ldthomas commented 3 years ago

Thank you. I will be looking into these issues soon.

mingodad commented 3 years ago

Not at all and thank you for your great work !

Also in api/input.c:


void vLineError(api* spCtx, aint uiCharIndex, const char* cpSrc, const char* cpMsg) {
...
        // generate the line text
        int n = 0;
        for(; n < strlen(cpSrc); n++){  ///!!! strlen() on every iteration 
            caBuf[n] = ' ';
        }
        caBuf[n++] = ':';
        caBuf[n++] = ' ';
        caBuf[n] = 0;```
mingodad commented 3 years ago

The problems reported by valgrind seems to be due to forget to update uiStrLen in apg/config.c see patch/diff bellow :

@@ -613,6 +613,7 @@ static void vExtractArgOptions(config_ctx* spCtx, char* cpParams) {
             if (*cpParams == 0) {
                 XTHROW(spCtx->spException, "options error: -o has no following output file name");
             }
+            uiStrLen = (aint) (strlen(cpParams) + 1);
             vpVecPushn(spCtx->vpVecOutput, cpParams, uiStrLen);
             uiStrLen = (aint) (strlen(cpParams) + 1);
             cpParams += uiStrLen;
mingodad commented 3 years ago

On the previous message I also did a mistake, it seems that you only swapped the increment to uiStrLen here is a better fix:

@@ -613,8 +613,8 @@ static void vExtractArgOptions(config_ctx* spCtx, char* cpParams) {
             if (*cpParams == 0) {
                 XTHROW(spCtx->spException, "options error: -o has no following output file name");
             }
-            vpVecPushn(spCtx->vpVecOutput, cpParams, uiStrLen);
             uiStrLen = (aint) (strlen(cpParams) + 1);
+            vpVecPushn(spCtx->vpVecOutput, cpParams, uiStrLen);
             cpParams += uiStrLen;
         } else if (strncmp(cpParams, "--output=", 9) == 0) {
             uiStrLen = (aint) (strlen(&cpParams[9]) + 1);
mingodad commented 3 years ago

Do you think that apg is a good fit for transpilers ? I have a project (https://github.com/mingodad/ljs/tree/master/lua2ljs) where I used lemon (https://en.wikipedia.org/wiki/Lemon_Parser_Generator) to transpile from Lua (http://www.lua.org/manual/5.3/manual.html#8) to LJS (https://github.com/mingodad/ljs) and I'm thinking in trying to remake it with apg to test/learn.

Do you have any programming language grammar in SABNF that you use or test apg ?

Cheers !

mingodad commented 3 years ago

I've been evaluating rdp1_6 (http://www.cs.rhul.ac.uk/research/languages/projects/rdp.html here is my repo with some changes https://github.com/mingodad/rdp1_6) and it has nice features but do not manage left recursion and other limitations of LR(1) parsers (also it hasn't been updated for some years).

mingodad commented 3 years ago

Reading again the API documentation I can see that apg also doesn't manage left recursion, somehow I've got confused at first and thought that apg was able to manage left recursion.

ldthomas commented 3 years ago

I've made and committed the fixes you've suggested. I guess I always assumed that the compiler optimizer would move the expressions out of the loop, but I'm not much of an expert on compiler optimizations so you are probably right. In any case, I changed them to use constant values. I couldn't reproduce your valgrind problem but I can't recall if did the valgrind test before or after I fixed the problem with the output file name. Here is what valgrind does for me. I notice that you are using a slightly higher version number but I doubt that accounts for the difference.

valgrind -s --track-origins=yes Debug/apg -i abnf-for-sabnf.abnf -o /tmp/valgrind-test ==13513== Memcheck, a memory error detector ==13513== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==13513== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==13513== Command: Debug/apg -i abnf-for-sabnf.abnf -o /tmp/valgrind-test ==13513== ==13513== ==13513== HEAP SUMMARY: ==13513== in use at exit: 0 bytes in 0 blocks ==13513== total heap usage: 1,636 allocs, 1,636 frees, 6,045,903 bytes allocated ==13513== ==13513== All heap blocks were freed -- no leaks are possible ==13513== ==13513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Thanks for your comments and suggestions. Great to have another pair of eyes on it and especially a different development environment. Let me think about your other questions and I'll get back to you. Looks like you are working on some interesting stuff.

mingodad commented 3 years ago

Hello Lowell ! Thanks again ! One thing I noticed that you used the fixes I suggested here https://github.com/ldthomas/apg-7.0/issues/1#issuecomment-802794596 but I noticed that I did a mistake there and fixed it here https://github.com/ldthomas/apg-7.0/issues/1#issuecomment-802806661 so I think that you'll need to review this commit https://github.com/ldthomas/apg-7.0/commit/cd00e34f8e5ad08b46140f4d166e275326372d8f because now you are doing this twice:

        } else if (strcmp(cpParams, "-o") == 0) {
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
            if (*cpParams == 0) {
                XTHROW(spCtx->spException, "options error: -o has no following output file name");
            }
            uiStrLen = (aint) (strlen(cpParams) + 1); //// !!!! here that fix the problem
            vpVecPushn(spCtx->vpVecOutput, cpParams, uiStrLen);
            uiStrLen = (aint) (strlen(cpParams) + 1); //// !!!!! but here a second time that probably you do not want !!!!
            cpParams += uiStrLen;
ldthomas commented 3 years ago

Actually the second one is needed to correctly update cpParams. uiStrLen then gets reset at the top of the loop.

mingodad commented 3 years ago

Ok ! Sorry by my bad interpretation !

mingodad commented 3 years ago

When I was going to update my code I realized that probably I'm still right with the proposed fix and you are doing extra unneeded work in other places too:

        if (strcmp(cpParams, "-i") == 0) {
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
            if (*cpParams == 0) {
                XTHROW(spCtx->spException, "options error: -i has no following input file name");
            }
            uiStrLen = (aint) (strlen(cpParams) + 1); //// !!! OK to get the correct string bellow
            vpVecPushn(spCtx->vpVecInput, cpParams, uiStrLen);
            uiStrLen = (aint) (strlen(cpParams) + 1); ///!!!!! nothing changed so far so no need to it again 
            cpParams += uiStrLen;
...
        } else if (strcmp(cpParams, "-o") == 0) {
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
            if (*cpParams == 0) {
                XTHROW(spCtx->spException, "options error: -o has no following output file name");
            }
            uiStrLen = (aint) (strlen(cpParams) + 1);
            vpVecPushn(spCtx->vpVecOutput, cpParams, uiStrLen);
            uiStrLen = (aint) (strlen(cpParams) + 1); ///!!!!! nothing changed so far so no need to it again 
            cpParams += uiStrLen;

You use a mix of criterias to updata/push parameters with several places with hardcoded sizes, it's a bit trick (prone to make mistakes).

mingodad commented 3 years ago

I did an experiment add some macros to try simplify the options check code:

static void vExtractArgOptions(config_ctx* spCtx, char* cpParams) {
    aint uiStrLen;
    char cZero = 0;
    int iOption = 1;

    // skip over the first argument
    cpParams = (char*)vpVecFirst(spCtx->vpVecArgs);
    uiStrLen = (aint) (strlen(cpParams) + 1);
    cpParams += uiStrLen;
    if (*cpParams == 0) {
        // no parameters, set help flag
        spCtx->bHelp = APG_TRUE;
        return;
    }
#define CHECK_BOOL_OPT(opt, optStr) else if (strncmp(cpParams, optStr, sizeof(optStr)-1) == 0) { \
            spCtx->opt = APG_TRUE; \
            cpParams += sizeof(optStr); \
        }
#define CHECK_STR_OPT(opt, optStr) else if (strncmp(cpParams, optStr, sizeof(optStr)-1) == 0) { \
            cpParams += sizeof(optStr); \
            uiStrLen = (aint) (strlen(cpParams) + 1); \
            vpVecPushn(spCtx->opt, cpParams, uiStrLen); \
            cpParams += uiStrLen; \
        }

    while (*cpParams != 0) {
        uiStrLen = (aint) (strlen(cpParams) + 1);
        if (strcmp(cpParams, "-i") == 0) {
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
            if (*cpParams == 0) {
                XTHROW(spCtx->spException, "options error: -i has no following input file name");
            }
            uiStrLen = (aint) (strlen(cpParams) + 1);
            vpVecPushn(spCtx->vpVecInput, cpParams, uiStrLen);
            cpParams += uiStrLen;
            spCtx->uiInputFiles++;
        }
        else if (strncmp(cpParams, "--input=", 8) == 0) {
            uiStrLen = (aint) (strlen(&cpParams[8]) + 1);
            vpVecPushn(spCtx->vpVecInput, &cpParams[8], uiStrLen);
            spCtx->uiInputFiles++;
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
        } else if (strncmp(cpParams, "--p-rules=", 10) == 0) {
            char cZero = 0;
            char* cpName = cpGetFirstName(&cpParams[10], &uiStrLen);
            vpVecPushn(spCtx->vpVecPRules, cpName, uiStrLen);
            vpVecPush(spCtx->vpVecPRules, &cZero);
            spCtx->uiPRules++;
            while((cpName = cpGetNextName(&uiStrLen))){
                vpVecPushn(spCtx->vpVecPRules, cpName, uiStrLen);
                vpVecPush(spCtx->vpVecPRules, &cZero);
                spCtx->uiPRules++;
            }
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
        } else if (strcmp(cpParams, "-o") == 0) {
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
            if (*cpParams == 0) {
                XTHROW(spCtx->spException, "options error: -o has no following output file name");
            }
            uiStrLen = (aint) (strlen(cpParams) + 1);
            vpVecPushn(spCtx->vpVecOutput, cpParams, uiStrLen);
            cpParams += uiStrLen;
        }
        CHECK_STR_OPT(vpVecOutput, "--output=")
        CHECK_STR_OPT(vpVecHtmlOut, "--grammar-html=")
        CHECK_STR_OPT(vpVecRulesHtmlOut, "--rules-html=")
        CHECK_STR_OPT(vpVecLfOut, "--lf=")
        CHECK_STR_OPT(vpVecCrLfOut, "--crlf=")
        else if (strcmp(cpParams, "-c") == 0) {
            uiStrLen = (aint) (strlen(CONFIG_FILE) + 1);
            vpVecPushn(spCtx->vpVecConfigOut, CONFIG_FILE, uiStrLen);
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
        } else if (strncmp(cpParams, "--config-file=", 13) == 0) {
            if (cpParams[13] == '='){
                uiStrLen = (aint) (strlen(&cpParams[14]) + 1);
                vpVecPushn(spCtx->vpVecConfigOut, &cpParams[14], uiStrLen);
            } else {
                vpVecPushn(spCtx->vpVecConfigOut, CONFIG_FILE, uiStrLen);
            }
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
        } else if (cpParams[0] == '@') {
            if (cpParams[1] == 0) {
                uiStrLen = (aint) (strlen(CONFIG_FILE) + 1);
                vpVecPushn(spCtx->vpVecConfigIn, CONFIG_FILE, uiStrLen);
            } else {
                uiStrLen = (aint) (strlen(&cpParams[1]) + 1);
                vpVecPushn(spCtx->vpVecConfigIn, &cpParams[1], uiStrLen);
            }
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
        }
        CHECK_BOOL_OPT(bVersion, "-v")
        CHECK_BOOL_OPT(bVersion, "--version")
        CHECK_BOOL_OPT(bHelp, "?")
        CHECK_BOOL_OPT(bHelp, "-h")
        CHECK_BOOL_OPT(bHelp, "--help")
        CHECK_BOOL_OPT(bStrict, "-s")
        CHECK_BOOL_OPT(bStrict, "--strict")
        CHECK_BOOL_OPT(bIgnoreAttrs, "--ignore-attributes")
        CHECK_BOOL_OPT(bNoPppt, "--no-pppt")
        CHECK_BOOL_OPT(bDra, "-dra")
        CHECK_BOOL_OPT(bDr, "-dr")
        CHECK_BOOL_OPT(bDg, "-dg")
        CHECK_BOOL_OPT(bDa, "-da")
        CHECK_BOOL_OPT(bDc, "-dc")
        CHECK_BOOL_OPT(bDo, "-do")
        CHECK_BOOL_OPT(bDp, "-dp")
        CHECK_BOOL_OPT(bDv, "-dv")
        else {
            printf("unrecognized option[%d]: %s\n", iOption, cpParams);
            spCtx->bHelp = APG_TRUE;
            uiStrLen = (aint) (strlen(cpParams) + 1);
            cpParams += uiStrLen;
        }
        iOption++;
    }
#undef CHECK_STR_OPT
#undef CHECK_BOOL_OPT
    // push the final null-term on the string of input file names
    vpVecPush(spCtx->vpVecInput, &cZero);
ldthomas commented 3 years ago

Regarding the -i and -o options, the second uiStrLen = (aint) (strlen(cpParams) + 1); is superfluous but does no harm. I've remove them anyway. I should have used just cpParams += (aint)(strlen(cpParams) + 1); for the update at the end of each option, but no harm done. The macros are cool, but I'll just leave it alone for now. But thanks for your time on this. Much appreciated.

mingodad commented 3 years ago

Not at all ! By the way on my macros I replaced stcmp by strncmp and that was a mistake for the BOOL check, here they are with strcmp:

#define CHECK_BOOL_OPT(opt, optStr) else if (strcmp(cpParams, optStr) == 0) { \
            spCtx->opt = APG_TRUE; \
            cpParams += sizeof(optStr); \
ldthomas commented 3 years ago

The answer to your transpiler question is that if I were to tackle that problem, of course APG would be my parser of choice. I've never written a compiler but I believe that the parser requires a healthy combination of lexical analysis, syntax and semantic translation. APG works well on all of those. One of the benefits of recursive-descent parsing vs. bottom-up is that there is no need for a separate lexical analysis step. And with APG lots of semantic work can be done in the callback functions. Take a look at the XML parser. Lots of the rule callback functions perform semantic translations.

As far as language grammars already written in ABNF you are probably out of luck there. ABNF is the syntax of Internet standards, but not commonly so far as I know of computer languages. Usually you have to get a grasp on the production syntax used by the language standard and translate it to ABNF on your own. See xml/xml.abnf. I wrote this as a direct translation from the XML standard (https://www.w3.org/TR/REC-xml/).

I did take a shot at C++ way back in version 4.0. I got as far as a crude preprocessor, but the amount of detail for a full parser was more work than I wanted to get into. I did have to translate the C++ standard into ABNF as I went along. I forget what I was using as the standard, but I did find this BNF version that should lend itself to ABNF (https://www.nongnu.org/hcb/).

Anyway, good luck with your work on that.