Open dvyukov opened 6 years ago
It's possible to generate unique origin ids for parts of a heap allocation with 4-byte granularity.
You mean generate via stackdepot? I was thinking that maybe we can eat some bits from origin and put offset there, so it's still only 1 stackdepot id. We would need to propagate the offset during chaining and print in report.
There is a trade off. You can eat through origin id values faster (I think we've seen a case when we ran out of them, but it's probably less of an issue for the kernel), or you can consume a bit more memory per allocation.
These "sub-origins" can be represented with a node in ChainedOriginDepot. This way the stack trace would only be stored once.
In any case, this can result in an explosion of chained origins. Imagine any kind of N*M algoritm, like matrix multiplication. Userspace msan has limits to deal with this.
Why will offsets in origin ids lead to explosion of chained origins? We strip offset, chain, attach offset again. Stackdepot always sees ids without offsets.
You are right. There is no explosion if we only track offsets in the original allocation, and not in subsequent stores.
This come up during analysis of several reports. It would be useful to output what bytes of heap blocks are uninitialized, similar to what we do for stack objects (or, maybe print shadow dump similar to KASAN). It may be useful to see if just 1 int is uninitialized, or just 1 bit in this int, or whole block. Is it theoretically possible to fit offset from the beginning of a heap block into origins too? If offset is rounded to 8 bits, it may be useful already. Or we could granularize offset to fit more.