Open leandro-lucarella-sociomantic opened 7 years ago
it would be better to extend the GC to be able to mark some page as executable and make use of it to allocate the fibers stacks.
Why does it need to be marked as executable, though? No code should be run from the stack.
One more important thing is that one needs to make sure that the allocated stack are always allocated on the page boundary, otherwise mprotect
will fail.
Why does it need to be marked as executable, though? No code should be run from the stack.
This was moved from an old internal issue, and I have no idea why I wrote that back in the day. :thinking:
One more important thing is that one needs to make sure that the allocated stack are always allocated on the page boundary, otherwise mprotect will fail.
Yeah, if we just allocate always full pages for the stack (which seems reasonable anyway) we should be fine.
When the number of fibers used is too big, a limit on the maximum amount of mappings can be hit making mmap fail. Since the fibers stack should be scanned by the GC, it would be better to extend the GC to be able to mark some page as executable and make use of it to allocate the fibers stacks.
I've been looking at this and it is seems it will take some time to get familiar with the threads/fibers code. As it turned out, it's not so simple to replace the allocation using
mmap()
with GC'ed allocations. Not because the call tommap()
is different from what the GC does (as I would expected) but because of how the threads are scanned by the GC.There is a complex logic saving threads and fiber contexts that needs to be carefully understood beofre making changes. A naive and suboptimal solution would be to just replace the allocations. It should work, but the fibers stacks will be scanned twice by the GC. Since after the first time it is scanned, only the stack itself will be read, but no references will be followed (because they are already marked), the performance impact would be only the time needed to read the fiber's stacks.
Back to the contexts logic. I have the feeling that most of the complexity is only there for the GC scanning code. By letting the GC allocate the memory (which will make the fiber's stack be automatically scanned), maybe the resulting code could end up being much more simpler and efficient (but this is just an early and wild guess).