Closed ghost closed 12 years ago
Is there any compile-time performance improvement with this patch? How does memory usage compare?
Significant parse/compile time improvements when compiling individual projects. Test suite time dropped ~20 secs ( ~8%). Allocations seem to be halved at worst for small modules. The default slab allocation is fairly small (4096 bytes) so memory usage is similar. Larger slabs made negligible difference as most allocations are around 100 bytes.
Be interesting to see how well this works on other platforms.
Well, BumpPtrAllocator never actually releases memory, so I think compile-time-computation-heavy code that allocates lots of temporaries will use more memory or run out of memory. For instance, one of the cocoa.* tests on OS X tops out at 1.89GB with your patch. While it's sensible to leak AST nodes, invoke tables, overloads, and other objects that are required for the lifetime of a compile job, I think there needs to be a smarter allocation strategy for EValues and CValues used for evaluation and codegen. Maybe a stack of BumpPtrAllocators mirroring the callstack could be made to work, though there will probably be problems with passing objects up the stack.
OTOH, the reference counting for BumpPtr allocated objects is unnecessary (since the object memory is never reclaimed anyway), and removing it would probably give even better performance improvements.
Joe, can you try your OSX test with the bump alloc restricted to ANodes. It still give a significant drop in allocations and increased performance.
I did try this with the ref counting disabled and it didn't have much impact. Using the bump for just ANodes makes this a moot point anyway as the other Objects need it.
Oh, the joy of learning C++! OK, i'll see if I can make the code look a little uglier and less intuitive, as the designers obviously intended.
Running the tests with only ANodes bump allocated works with about the same memory footprint as before, although it's a bit slower—bump-allocating everything takes 713s, while bump-allocating only ANodes takes 801s. (That's with LLVM compiled with Debug+Asserts.)
And of course, someone on a Mac that doesn't have 8GB of RAM would get a much slower count for the former case.
Did you see any increase in performance with this? Do the cocoa tests still require huge amounts of memory?
Memory usage is the same as before. There's a small speed up running the tests.
Significantly reduces number of allocations during parsing and compilation.
This is pretty simple implementation designed to minimise source changes, but seems to work ok. Maybe addresses issue #123.