phorward / unicc

LALR parser generator targetting C, C++, Python, JavaScript, JSON and XML
MIT License
59 stars 9 forks source link

Memory Management Issues and Halting a Parser #7

Closed AuzFox closed 6 years ago

AuzFox commented 6 years ago

(Sorry for the long message! I wanted to explain everything thoroughly, so the text just kept growing!)

Hello! I've been using UniCC to write a compiler for a programming language after getting fed up with flex/bison's clunky interface. I must say, UniCC is much easier to work with! However, I believe there is a problem with the way UniCC manages its memory, specifically when memory allocation fails.

The default definition of UNICC_OUTOFMEM simply prints an error message and calls exit(), if I am correct, this leaks all memory allocated for the pcb struct. (potentially including the struct itself if @@prefix_parse() is passed NULL) Additionally, calling exit() immediately prevents me from cleaning up any memory allocated or files opened outside of the @@prefix_parse() call.

Since UNICC_OUTOFMEM can be redefined, I tried putting pointers to all the data I needed to free in the pcb struct and redefined UNICC_OUTOFMEM to pass pcb to a function to that frees everything before calling exit(). This caused problems because:

  1. UNICC_OUTOFMEM is used in lots of different places, and pcb usually expands correctly. But in a few places (like the ast-related functions) there is no pcb reference, and the macro expansion causes compile errors.
  2. typically, the #prologue code is inserted before thepcb struct definition. In order to try and set up everything "properly" I wrote something like this in my .par file:

    
    #prologue
    [*
        ...
    
        #define UNICC_OUTOFMEM \
        fputs("ERROR: out of memory\n", stderr); \
        cleanup(pcb); \
        exit(1)
    
    #include "compiler_name.h" // include UniCC generated .h file
    
    void cleanup(@@prefix_pcb* pcb);
    *]

epilogue

[* ...

    void cleanup(@@prefix_pcb* pcb)
{
    // free everything here
}

    ...

*]


which is awkward and doesn't even work because of the previous reason.
3. I have no way of knowing if the `pcb` passed to `cleanup()` was allocated using `malloc()` (`@@prefix_parse()` was passed `NULL`)

Is there something I'm missing? I currently can't figure out how to not leak memory without using global variables, ugly hacks or manually altering the files UniCC generates. (which is a bad idea for multiple reasons)

Also, as a sort of side-note, it would be nice to have the ability to manually stop the parser inside an action (besides just calling `exit()`), like the flex/bison `YYERROR` macro (I've read the manual, and I don't believe one exists). I don't mean to be rude asking so many things, but this feature would be nice. This would allow the user to stop the parser if an error occurred, like if `strdup()` failed or a duplicate symbol was defined or something. Of course, this feature would also have to avoid leaking memory.

Again, I hope I don't come off as rude or demanding! If there is a solution to these problems already, I apologize!
phorward commented 6 years ago

Hi, and thank you for submitting your issue.

I've tried to manage your problems with commit 3a338e5, can you please check it our for your use-case? UNICC_OUTOFMEM is now declared as a macro that gets pcb as a parameter in all cases. I corrected all occurences of UNICC_OUTOFMEM where pcb was not present. Furthermore, you can stop the parser by setting pcb->act = UNICC_ERROR; in case you want to stop parser, likewise YYERROR should do it. Please check out if this is working for you.

For your third point, would it be helpful to hold a variable in the pcb that gets TRUE in case the pcb was allocated by @@prefix_parse()?

Again, I hope I don't come off as rude or demanding! If there is a solution to these problems already, I apologize!

Its never rude to help an open source project become better, so thanks again for your request! Any help is always welcome.

AuzFox commented 6 years ago

Its never rude to help an open source project become better, so thanks again for your request! Any help is always welcome.

That's good! I guess I just get nervous talking to people, especially online, so I tend to apologize a lot, haha.

At any rate, I've downloaded the updated version and everything compiles and runs successfully. A simple test conditionally setting pcb -> act = UNICC_ERROR; correctly stopped the parser, so that seems to work great! One question I have is: will this work in lexer terminal actions as well? I don't need this in my project, I'm just wondering. I also did a simple test on UNICC_OUTOFMEM by manually changing the UniCC-generated .c file to always fail a memory allocation; the call to my cleanup() function worked as intended. This is what my cleanup() function looks like (I changed the name too):

void @@prefix_cleanup(@@prefix_pcb* pcb)
{
    // @@prefix_parse() calls UNICC_OUTOFMEM
    // if it fails to malloc() the pcb structure,
    // in this case, pcb will be NULL
    if (!pcb)
    {
        return;
    }

    // clean up pcb internals
    free(pcb -> buf);
    free(pcb -> stack);

    #if UNICC_UTF8
    free(pcb -> lexem);
    #endif

    #if UNICC_SYNTAXTREE
    @@prefix_syntree_free(pcb -> syntax_tree);
    #endif

    @@prefix_ast_free(pcb -> ast);

    // clean up custom pcb contents
    fclose(pcb -> infile);

    // clean up pcb struct itself
    // (if allocated with malloc(), @@prefix_parse() passed NULL)
    //if (pcb -> allocated) // insert whatever actual name is given to this variable
    //{
    //  free(pcb);
    //}

        printf("called cleanup()!\n");
}

The final if statement is obviously commented out because it won't work yet. Speaking of which, I think your idea of giving pcb a variable to check if allocated by @@prefix_parse() would be a good idea. It would stop pcb from being leaked if allocated by @@prefix_parse(). My project never calls @@prefix_parse(NULL);, but it makes sense to provide this for anyone who might need it.

Thank you for adding this and helping me so quickly!

In my project, I plan on adding an import statement to my grammar that would call @@prefix_parse() recursively on other files. I want to check for errors and avoid leaking any memory, so UNICC_OUTOFMEM must not call exit() in this case. I took a look at the template files and I believe I found some problems that might occur, as well as some simple memory management errors:

in @@prefix_syntree_append(): node is leaked if: token != NULL and strdup() fails. token != NULL and wcsdup() fails. if allocation of node fails and UNICC_OUTOFMEM does not call exit(): memset() on node will cause a SEGFAULT and probably crash the parser.

in @@prefix_alloc_stack(): pcb -> stack is leaked if: realloc() on the stack fails. (pcb -> stack is overwritten with NULL and the old pointer is lost without freeing.)

in @@prefix_get_input(): pcb -> buf is leaked if: realloc() on buf fails. (pcb -> buf is overwritten with NULL and the old pointer is lost without freeing.)

The leaks in @@prefix_alloc_stack() and @@prefix_get_input() can be fixed by using a temporary variable. Something like this:

// example variable declarations:
void* main_pointer = &some_memory;
void* temp_pointer;

...

// when re-allocating:
temp_pointer = realloc(main_pointer, new_size);
if (!temp_pointer)
{
    // main_pointer needs to be freed, it is not freed by realloc()
    free(main_pointer);

    // handle rest of error here (return error code, etc)
}

// realloc() was successful:
main_pointer = temp_pointer;

If you don't save the main pointer before the call to realloc(), you won't be able to free it if realloc() fails. You probably already knew this, but if not, it's good to know!

There might be other leaks and errors, but these are the ones I found. Additionally, I don't think all instances of memory allocation are checked for errors and handled appropriately in functions like @@prefix_parse(). This makes sense when UNICC_OUTOFMEM calls exit(), but might cause problems in the parser if exit() isn't called and memory allocation fails. I haven't completely studied the whole parser algorithm so I might be wrong about this though.

Thanks again!

phorward commented 6 years ago

Hey there, and thanks for your further suggestions. I've integrated them all into the latest commit 052df85, you may check it out. All improvements where integrated both into the C and C++ targets.

The UNICC_SYNTAXTREE feature has been completely removed now from the C target (it was already deprecated and can easily be replaced by the AST construction feature).

One question I have is: will this work in lexer terminal actions as well? I don't need this in my project, I'm just wondering.

It should do so, but I haven't tested it yet.

If you don't save the main pointer before the call to realloc(), you won't be able to free it if realloc() fails. You probably already knew this, but if not, it's good to know!

Yep that was known to me, but I developed this parser many years ago and didn't focus at this time on the case what happens when the heap runs out of memory. But its good to integrate your recommendations, better now then never.

So long, Jan

phorward commented 6 years ago

The final if statement is obviously commented out because it won't work yet. Speaking of which, I think your idea of giving pcb a variable to check if allocated by @@prefix_parse() would be a good idea.

You can check via pcb->is_internal now whether the pcb was allocated by @@prefix_parse() or from outside. I've recently fixed this with 67cad67 to always be in a valid state.

AuzFox commented 6 years ago

Yep that was known to me, but I developed this parser many years ago and didn't focus at this time on the case what happens when the heap runs out of memory. But its good to integrate your recommendations, better now then never.

I see. That makes sense. I know I've said it a few times already, but thanks for taking the time to implement these things. \:3

I hope I'm not being annoying with all these fix requests! I know you're probably busy with other things besides working on UniCC. I found 2 more problems worth pointing out. Besides these though, there doesn't seem to be any other memory-related issues besides these, so I'll leave you alone after this. \:P

@@prefix_ast_create() doesn't check if the token passed to it from @@prefix_lexem() is NULL because of an allocation failure.

@@prefix_parse() doesn't check calls to @@prefix_ast_create() for errors.

phorward commented 6 years ago

Hey @AuzFox,

@@prefix_ast_create() doesn't check if the token passed to it from @@prefix_lexem() is NULL because of an allocation failure.

That's not correct: @@prefix_ast_create() can also be called with NULL as token parameter and handles this correctly, so no token will be copied into the AST node.

@@prefix_parse() doesn't check calls to @@prefix_ast_create() for errors.

That's right, and now hopefully fixed with 453a853 :+1:

I hope I'm not being annoying with all these fix requests! I know you're probably busy with other things besides working on UniCC. I found 2 more problems worth pointing out. Besides these though, there doesn't seem to be any other memory-related issues besides these, so I'll leave you alone after this. :P

I'm very thankful for any help making UniCC better and more stable. If you like and use it, its an honor for me. Sure I'm also busy with other things, but UniCC is a project I totally do for fun, and it's a lot of fun when people like you are using and improving it. So please don't leave me alone, and help to make it better :+1:

So long, Jan

phorward commented 6 years ago

Hi @AuzFox, can I close this issue for now?

AuzFox commented 6 years ago

Sorry, I have been busy with things and forgot to check back here. Yes, this issue can be closed.