vnmakarov / yaep

Yet Another Earley Parser
Other
135 stars 13 forks source link

Reentrant allocator #16

Closed TheCount closed 5 years ago

TheCount commented 5 years ago

This is an early preparation for a fix to #12.

It basically does two things:

Note that the new allocate.h internal interface is not compatible to the old one, and the fact that the allocator is now reentrant makes it necessary to carry an alloc handle around, which had ramifications throughout the code.

Bonus feature: yaep now has a convenience function to free a returned parse tree.

sanssecours commented 5 years ago

Thank you for working on this.

The first thing I noticed is that the unit tests still fail after I apply this pull request. I really think the tests should work before and after someone adds code to the main branch of the repository.

I also noticed that I do not know how to use the new memory management code. If I do not change my code, then the compiler reports the following errors:

In file included from src/plugins/yawn/convert.cpp:24:
In file included from src/plugins/yawn/memory.hpp:13:
/usr/local/include/yaep/objstack.h:346:2: error: unknown type name 'YaepAllocator'
        YaepAllocator * os_alloc;
        ^
/usr/local/include/yaep/objstack.h:355:15: error: unknown type name 'YaepAllocator'
        explicit os (YaepAllocator * allocator, size_t initial_segment_length = OS_DEFAULT_SEGMENT_LENGTH);
                     ^
/usr/local/include/yaep/objstack.h:505:17: error: unknown type name 'YaepAllocator'
        friend os::os (YaepAllocator *, size_t);

. I fixed the problem by replacing the memory management function with malloc. Since you are certainly more familiar with the code than me, could you maybe tell me if that is a good approach? What allocation function would you use as argument for the method yaep::parse?

To end this on a positive note, AddressSanitizer does not report any memory leaks if the OS creates a yaep object any more. The method yaep::parse still seems to leak memory though.

TheCount commented 5 years ago

The first thing I noticed is that the unit tests still fail after I apply this pull request. I really think the tests should work before and after someone adds code to the main branch of the repository.

On my machine, all the tests succeed. To debug what's going wrong with the Travis tests, we need more info (the generated yaep_tyaep.c and contents of the two generated outputs). I don't know Travis very well. Is there some option to obtain this info?

Just a thought: which branch are you applying the pull request to? vnmakarov/main or some other branch? Your PR #15 is quite possibly incompatible with my PR #16 (see below).

I also noticed that I do not know how to use the new memory management code. If I do not change my code, then the compiler reports the following errors: […]

In making the new allocator, I changed the API of OS, VLO, and hash_table. I didn't think much of it since these were private interfaces anyway (I actually even removed allocate.h, since yaep doesn't need to make that public). However, almost simultaneously, you created #15, making (at least) OS public. Hence, boom.

I fixed the problem by replacing the memory management function with malloc. Since you are certainly more familiar with the code than me, could you maybe tell me if that is a good approach?

I'm not sure what you mean here. If you replace all calls of yaep_malloc() with malloc(), and so on, you lose the error recovery feature of allocate.h. As long as there is enough memory, you won't notice a difference, though.

What allocation function would you use as argument for the method yaep::parse?

I'd like to say malloc() and free() are certainly good choices, except that the interface is a little botched because parse_alloc() expects an int instead of a size_t. Maybe a good approach would be to extend yaep_parse() to accept null pointers for parse_alloc() and parse_free(), using some sensible internal default in this case.

The method yaep::parse still seems to leak memory though.

If you call yaep_free_grammar() after yaep_parse(), the only memory that should remain is the parse tree (and to free that, I've added a new function yaep_free_tree()). Do you see other leakage?


So, how to go on from here?

sanssecours commented 5 years ago

Thank you for the quick, extensive and helpful reply.

On my machine, all the tests succeed.

Interesting, on my machine the test suite fails, just like it does on Travis.

Is there some option to obtain this info?

Yep, just sign up with your GitHub account at http://travis-ci.com and enable testing for your version of the YAEP repository. For more information, please take a look here. Afterwards just change the file .travis.yml to retrieve the information you need.

Just a thought: which branch are you applying the pull request to? vnmakarov/main or some other branch?

I crated a branch based on this PR, uninstalled YAEP and then installed YAEP via ./configure && make -C src install again.

Your PR #15 is quite possibly incompatible with my PR #16 (see below).

I assumed as much. Anyway, I did not apply your code on my pull request or vice versa. I just used the code of this PR to build and install YAEP. I also removed objstack.h from /usr/local/include.

I'm not sure what you mean here.

The C++ method yaep::parse contains the argument void *(*parse_alloc) (int nmemb). Unfortunately, I need to provide a memory allocation function for this argument or parsing will not work.

If you replace all calls of yaep_malloc() with malloc(), and so on, you lose the error recovery feature of allocate.h. As long as there is enough memory, you won't notice a difference, though.

Interesting, I did not know that such a feature exists.

Maybe a good approach would be to extend yaep_parse() to accept null pointers for parse_alloc() and parse_free(), using some sensible internal default in this case.

That would be really great, especially if the C++ interface would provide the same feature.

If you call yaep_free_grammar() after yaep_parse(), the only memory that should remain is the parse tree (and to free that, I've added a new function yaep_free_tree()). Do you see other leakage?

As far as I can tell, yaep_free_grammar will be called automatically by the destructor of yaep. I guess I still have to free the parse tree then. Unfortunately yaep_free_tree is not accessible from within C++. According to the documentation, the function has also to be called after yaep_free_grammar(). I do not know how I can do that, and still be able to use RAII. I guess I have to live with the memory leaks in my code for now.

Why/how are you using OS?

I use OS because the tests (yaep++.tst) also use this code. Since the documentation of YAEP is not very extensive, I just copied the code from one of the tests, and modified it until I had a minimal working example. Afterwards I used what I had learned to create parsing code for a very small subset of YAML.

Your thoughts?

I would prefer that YAEP does not require that I provide a memory allocation function at all. As a user of YAEP I (currently) do not care what code the library uses to allocate and deallocate memory. In my opinion an interface that makes memory management easier would be really great. For example, I think using unique_ptr to manage the memory for the parse tree would be a nice improvement.

TheCount commented 5 years ago

yaep_free_tree() not having a counterpart in the C++ interface is an oversight on my part, which I will fix by adding a commit on top of this pull request.

Regarding yaep_parse()/yaep::parse(), the specification currently says that parse_alloc is used for allocation, and that parse_free is the counterpart to parse_alloc unless parse_free is a null pointer, in which case the memory is not freed. So it's possible to extend the semantics as follows without breaking backwards compatibility:

I'll also add this on top of this PR.

Thank you for the pointers regarding Travis. I'll look into that and into your OS usage after I have fixed the above things.

TheCount commented 5 years ago

Added two commits, see https://github.com/vnmakarov/yaep/pull/16#issuecomment-427664063

sanssecours commented 5 years ago

Wow, you are really fast 😊. I was able to fix the memory leaks using the latest code from your PR. Thank you very much.

TheCount commented 5 years ago

OK, I've also got travis to run and see this: https://travis-ci.org/TheCount/yaep/jobs/438340845

What happens here is essentially that yaep_read_grammar() frees the grammar on error even though it does not "own" it. Later attempts to use the grammar's allocator lead to a segmentation fault.

The API description does not mention this behaviour, so I'll just remove it.

vnmakarov commented 5 years ago

Thank you for the patches. It is a big work. I'll try to review the patches on the next week and, if they are ok, I'll merge them into the repository.

YAEP was originally implemented on C. For some reasons, I decided to add C++ interface. But basically it was a wrap up around C.

YAEP was a part of COCOM toolset and as many other tools it used a lot of common data-structures (os, vlo, hashtab etc). I don't work on COCOM anymore (some tool designs are too outdated) but on the base of it programming language DINO implemented http://dino-lang.github.io/ (it is my language to do a research on dynamic programming languages implementations). Earley parser is part of dino standard library. So YAEP code is very old and probably need some modernization.

vnmakarov commented 5 years ago

Alexander, thank you again for the patches. I've merged the pull request.

YAEP code uses GNU standard style. So I've changed the format of your changes after the merge.