scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.28k stars 1.54k forks source link

stack guard should always be enabled #718

Open espindola opened 4 years ago

espindola commented 4 years ago

I have recently spent most of a day tracking a bug that did not reproduce in debug build. The issue was just a stack overflow, but since debug is the only mode with stack guards the crashes were super hard to track since they manifested as corrupted data structures. Once I manually enabled the stack guard the problem became obvious.

I have seen enough crashes in scylla that look impossible, so I am a bit suspicious that we might be hitting a stack overflow from time to time in there.

Right now the stack protector actually allocates a page and changes the protection. It should be possible to not back the guard page and not count that memory as used in our own allocator. With that the stack guards would be effectively free and we could always enable them.

avikivity commented 4 years ago

They aren't free since they effectively disable huge pages for that page. They also increase the cost of creating a thread on large machines, since mprotect() has to perform a tlb shootdown of that page. This can be mitigated by keeping a cache of stacks, but that comes with its own cost.

In terms of accounting there's no problem to mprotect() the bottom-most page of a stack.

nyh commented 4 years ago

We have a "Dev" build mode which supposedly developers are using for most of their development work. Dev build-mode is supposed to result in reasonably-fast executable performance - but not necessarily the fastest possible performance. This build mode can enable every debugging feature which has a reasonable - but not necessarily zero - performance overhead, so that bugs are more easily caught during development. Stack guards could be one of those things that we enable in the Dev build mode, and doing that could be a trivial patch to CMakeLists.txt.

espindola commented 4 years ago

Enabling it in Dev mode is a nice improvement, but I think there is still value in enabling it always as the frame sizes can be larger in Release mode because of optimization.

In any case, I will send a patch for Dev first.

espindola commented 4 years ago

In terms of accounting there's no problem to mprotect() the bottom-most page of a stack.

But given that the stack memory is allocated with alloc_aligned and not mmap, that means that the allocator thinks it has one page less to use than it really does, no?