Closed ramosian-glider closed 2 years ago
In user-space we limit length of origin chains to some small number (e.g. 5). Perhaps we need the same in kernel.
Also, if it's updated in a loop, we could ignore addition of the same stack to the chain. I.e. if the old chain is (A+B) and we add B again, then we return the current (A+B) stack id. Does it make any sense? Or we already do this?
In userspace we have two limits - on the length of the chain, and on the number of chains with each given stack trace on top.
On Mon, Nov 12, 2018 at 10:29 AM Dmitry Vyukov notifications@github.com wrote:
In user-space we limit length of origin chains to some small number (e.g. 5). Perhaps we need the same in kernel.
Also, if it's updated in a loop, we could ignore addition of the same stack to the chain. I.e. if the old chain is (A+B) and we add B again, then we return the current (A+B) stack id. Does it make any sense? Or we already do this?
This could be useful, but the way stack depot is implemented peaking at the top stack trace of a chain is expensive - it's a reverse lookup in a hash table.
—
You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/kmsan/issues/47#issuecomment-437983319, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZuSsiMQ3wp-VSWUWcwtK6XrLhgU4Xtks5uub4QgaJpZM4YZy10 .
In kernel stackdepot mapping from id to stack is super fast: https://github.com/torvalds/linux/blob/master/lib/stackdepot.c#L197-L207 Should do the same in userspace.
We do have a limit on chaining (7 stacks), and actually in this particular case the problem doesn't lead to stack depot explosion, because the stack traces are always the same (they start from the exit() syscall). There's a warning in KMSAN that's printed on every 10000 dropped chains, but the case in question is on a hot path, and it renders this warning useless (and also leads to tons of unnecessary accesses to the stack depot).
This shouldn't be a problem anymore. When an origin chain hits the maximum length, it stops growing, and __msan_chain_origin() will continue assigning the same origin id to every origin created from that chain. There will be cases when this happens too often - I am planning to introduce a tracepoint to detect such cases, but they won't be hurting us noticeably anyway.
Starting from Linux v.4.20-rc2 there're seven 1-bit fields in
struct mmu_gather
packed into an unsigned int value. One of those fields is updated in a tight loop. On each update, the remaining 25 uninitialized bits receive a new origin generated from the previous one:This generates a lot of unnecessary origin chains, which may quickly fill the stack depot and start reporting warnings.