[analyzer] Explore the design space of the EQClass::simplify implementation

steakhal commented 1 year ago

The range-based solver in the Clang Static Analyzer is quite complicated. We also track equality and disequality classes making things more complicated. We have unintuitive invariants, and it's difficult to reason about the possible states of the analyzer's solver.

In the past, we had to remove a seemingly perfectly valid assertion at one place because of this in afcf70aa6de77a0e99f6cb2d3e3824049203d6e2 aka. D138037. Read the details there to learn more about the internals of the range-based solver.

This ticket is about exploring how to get back that assertion and offer more tight invariants. We should be really careful about the runtime and memory footprint implications of the explored solutions.

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-static-analyzer

SavchenkoValeriy commented 1 year ago

Hi there folks. I noticed the conversation going, and decided to pop in. Why do you consider keeping class representatives alive?

balazs-benics-sonarsource commented 1 year ago

@SavchenkoValeriy Hi there folks. I noticed the conversation going, and decided to pop in. Why do you consider keeping class representatives alive?

I no longer have the details in my mind, but I think I did a pretty good job describing the scenario in the summary of D138037.

Anyway, let me try to summarize without investing much time. I think the main point was that the removed assertion seems to make sense, but due to how it works internally it doesn't hold in all cases.

ProgramStateRef OldState = State;
State = merge(F, State, MemberSym, SimplifiedMemberSym);
if (!State)
  return nullptr;
// No state change, no merge happened actually.
if (OldState == State)
  continue;

assert(find(State, MemberSym) == find(State, SimplifiedMemberSym));

Which is basically, if we merge the EQClass of MemberSym and SimplifiedMemberSym which results in a non-null, different state, then the subsequent lookup using either of those symbols should result in the same EQClass.

The thing is that under specific circumstances this doesn't hold, and that case is described in detail in the summary.

Given this assertion, it feels odd that it doesn't hold. It would be hold if we didn't erase the representative symbol, but it doesn't seem to be the right solution to keep those artificially in the maps - basically leaking them.

This ticket is about thinking of different ways to make that assertion hold without keeping the representative symbols alive.

SavchenkoValeriy commented 1 year ago

Thanks for a summary :-) I figured as much. Something fishy is going on there like a misuse of representatives. Because it shouldn't matter if representative is alive or not. ID is just an ID and the fact that it's a pointer to representative is an implementation detail. To keep it alive would be just a bandaid hiding the actual bug, which is still present and can manifest itself in other forms.

llvm / llvm-project

[analyzer] Explore the design space of the EQClass::simplify implementation #60836