(follow-up PR 8015): false null dereference because of inaccurate reasoning of perfectly constrained array indices

tkremenek commented 14 years ago


Bugzilla Link	8057
Version	trunk
OS	All
CC	@belkadan,@tkremenek,@xuzhongxing

Extended Description

From PR 8015:

+// FIXME: This is a false positive due to not reasoning about symbolic +// array indices correctly. Discussion in PR 8015. +void pr8015_D_FIXME() {

int number = pr8015_A();
const char *numbers[] = { "zero" };
if (number == 0) {
if (numbers[number] == numbers[0])
return;
int *p = 0;
*p = 0xDEADBEEF; // expected-warning{{Dereference of null pointer}}
} +}

Here we flag a bogus null dereference because 'numbers[number]' resolves to UnknownVal, and not the region for "zero".

d4e15ef9-ffb5-4988-bc6f-d8a3b3e1d54a commented 14 years ago

We can explicitly split up states when loading from an array with a symbolic index. For this example, we can assume 'number' equal to 0, 1, 2, 3, 4 respectively, and load from where it is feasible. Here only case 0 is feasible.

This split up can happen in either GRExprEngine or StoreManager. I'm inclined to let it happen in GRExprEngine, and be controlled by an option, say '-analyzer-split-symbolic-index'.

The reasons are that we can keep store manager simple. It only reasons about concrete index. And this is analysis logic and such 'intelligence' should happen in GRExprEngine.

I completely disagree. We should not be afraid of doing the right thing. The StoreManager has knowledge of what it can and cannot represent as far as region bindings. That knowledge is not privy to GRExprEngine (nor should it be). GRExprEngine should only be concerned with evaluating expressions, not adding hacks to work around deficiencies in the StoreManager's reasoning.

What we are trying to do is inherently not simple. I agree that we should try and keep each of the pieces of the analyzer as simple as we can, but sometimes there is just inherent algorithmic complexity in the quest for a truly scalable, general solution.

For this particular problem, when indices are constrained to concrete values, I think is realistic to have GRExprEngine do the "constant-propagation" when it evaluates the ArraySubscriptExpr. That is, when we see:

numbers[number]

and we see that 'number' evaluates to a symbolic value we can just check if it is perfectly constrained to a constant and then forward propagate it. Then when we do the load it is treated as a concrete index, and StoreManager doesn't get involved.

I think lazily concretizing symbols makes a lot of sense. We just need a way to represent the values related to a region with symbolic index.

And there is no need for GRExprEngine to handle such special cases. Store manager should handle all the reasonings.

My concern is the cases where we don't have concrete indexes, and we are inherently reasoning about a range of possible values. We need to think outside the box here, and not be afraid at doing something that is inherently more complex in StoreManager but ultimately will make the analyzer far more powerful. I'm not arguing that we do this for RegionStoreManager (since it is complicated enough already), but we shouldn't lose sight of where we are going. GRExprEngine cannot possibly handle all these cases because it just doesn't have the knowledge of what ConstraintManager and StoreManager can represent.

Ultimately, my hope is that we only concretize when we have to. For example, consider:

const char *p = numbers[number]

here 'p' is assigned a value, but we don't really need to care about how constrained that value is until we use it. Even if I did:

const char *p = numbers[number] char c = p[0]

we still don't need to concretize; all we care about 'c' is that it is the first character of the array pointed by p. Only when we use the value of 'c' in a condition do we care about its actual value. For example, if I did:

if (c == 'o') { ...

then on the true branch we know that 'p' points to "one", and on the false branch it points to "zero" or "two". That's the kind of reasoning power we should be aiming for.

tkremenek commented 14 years ago

We can explicitly split up states when loading from an array with a symbolic index. For this example, we can assume 'number' equal to 0, 1, 2, 3, 4 respectively, and load from where it is feasible. Here only case 0 is feasible.

This split up can happen in either GRExprEngine or StoreManager. I'm inclined to let it happen in GRExprEngine, and be controlled by an option, say '-analyzer-split-symbolic-index'.

The reasons are that we can keep store manager simple. It only reasons about concrete index. And this is analysis logic and such 'intelligence' should happen in GRExprEngine.

I completely disagree. We should not be afraid of doing the right thing. The StoreManager has knowledge of what it can and cannot represent as far as region bindings. That knowledge is not privy to GRExprEngine (nor should it be). GRExprEngine should only be concerned with evaluating expressions, not adding hacks to work around deficiencies in the StoreManager's reasoning.

What we are trying to do is inherently not simple. I agree that we should try and keep each of the pieces of the analyzer as simple as we can, but sometimes there is just inherent algorithmic complexity in the quest for a truly scalable, general solution.

For this particular problem, when indices are constrained to concrete values, I think is realistic to have GRExprEngine do the "constant-propagation" when it evaluates the ArraySubscriptExpr. That is, when we see:

numbers[number]

and we see that 'number' evaluates to a symbolic value we can just check if it is perfectly constrained to a constant and then forward propagate it. Then when we do the load it is treated as a concrete index, and StoreManager doesn't get involved.

My concern is the cases where we don't have concrete indexes, and we are inherently reasoning about a range of possible values. We need to think outside the box here, and not be afraid at doing something that is inherently more complex in StoreManager but ultimately will make the analyzer far more powerful. I'm not arguing that we do this for RegionStoreManager (since it is complicated enough already), but we shouldn't lose sight of where we are going. GRExprEngine cannot possibly handle all these cases because it just doesn't have the knowledge of what ConstraintManager and StoreManager can represent.

Ultimately, my hope is that we only concretize when we have to. For example, consider:

const char *p = numbers[number]

here 'p' is assigned a value, but we don't really need to care about how constrained that value is until we use it. Even if I did:

const char *p = numbers[number] char c = p[0]

we still don't need to concretize; all we care about 'c' is that it is the first character of the array pointed by p. Only when we use the value of 'c' in a condition do we care about its actual value. For example, if I did:

if (c == 'o') { ...

then on the true branch we know that 'p' points to "one", and on the false branch it points to "zero" or "two". That's the kind of reasoning power we should be aiming for.

tkremenek commented 14 years ago

I agree with Zhongxing; I don't think we want to have DisjunctionVals, constrained symbols, /and/ multiple GRStates. It wouldn't be too hard to split states based on the feasible values of a symbol -- it's basically just a set of Assume()s.

OTOH, that doesn't really solve this problem, which is that even when we do split states like that, we still end up with a symbolic ER index. We probably are going to need to pass the state into the store to get around this.

After assuming the symbol equals to a specific value, we could load from the element region with a concrete index. That is, making up an element region with a concrete index and pass it to the store manager.

Yeah, but the problem is the ER could have been created before the index was constrained. Doing what you said would be a good partial fix, though. (Would SValuator already do this?)

I disagree with both of you. Fundamentally StoreManager needs to do the right thing, and the right thing depends on how powerful the StoreManager is at reasoning about symbolic indices. That logic should not go into GRExprEngine other than as a workaround. StoreManager needs to be able to tell what a load returns, and that could be a set of values. Eagerly concretizing indices or making 'assumptions' does two things:

(1) We eagerly split paths even when it doesn't really matter

(2) As part of (1), we add assumptions (constraints) that don't really matter.

Basically (1) and (2) aren't really lazy, but eager. Lazy reasoning is the only way were going to truly get general handling of symbolic values.

Consider the problem where we have:

const char *p = number < 3 ? numbers[number] : NULL;

What do we want to do here? We have two scenarios:

'number < 3' => p binds to "zero", "one", or "two" 'number >= 3' => p binds to NULL

Ideally we should be able to represent this with one GRState and one path. I'm will to concede two paths here (one where p is explicitly NULL, the other where it is one of the string literals), but there is no reason for us to have 4 paths here until the index is explicitly checked against '0', '1', or '2'.

Now consider the case where 3 is a symbol:

const char *q = number < n ? numbers[number] : NULL;

This has 3 scenarios:

'numbers <= n' => p binds to NULL 'numbers < n' && n < length(numbers) => p binds to one of the values in numbers 'numbers < n' && n > length(numbers) => p binds to undefined

How are we going to represent this? Concretizing state values is not always going to be a valid solution, and it really isn't scalable when we consider symbolic indices.

belkadan commented 14 years ago

I agree with Zhongxing; I don't think we want to have DisjunctionVals, constrained symbols, /and/ multiple GRStates. It wouldn't be too hard to split states based on the feasible values of a symbol -- it's basically just a set of Assume()s.

OTOH, that doesn't really solve this problem, which is that even when we do split states like that, we still end up with a symbolic ER index. We probably are going to need to pass the state into the store to get around this.

After assuming the symbol equals to a specific value, we could load from the element region with a concrete index. That is, making up an element region with a concrete index and pass it to the store manager.

Yeah, but the problem is the ER could have been created before the index was constrained. Doing what you said would be a good partial fix, though. (Would SValuator already do this?)

d4e15ef9-ffb5-4988-bc6f-d8a3b3e1d54a commented 14 years ago

I agree with Zhongxing; I don't think we want to have DisjunctionVals, constrained symbols, /and/ multiple GRStates. It wouldn't be too hard to split states based on the feasible values of a symbol -- it's basically just a set of Assume()s.

OTOH, that doesn't really solve this problem, which is that even when we do split states like that, we still end up with a symbolic ER index. We probably are going to need to pass the state into the store to get around this.

After assuming the symbol equals to a specific value, we could load from the element region with a concrete index. That is, making up an element region with a concrete index and pass it to the store manager.

belkadan commented 14 years ago

I agree with Zhongxing; I don't think we want to have DisjunctionVals, constrained symbols, /and/ multiple GRStates. It wouldn't be too hard to split states based on the feasible values of a symbol -- it's basically just a set of Assume()s.

OTOH, that doesn't really solve this problem, which is that even when we do split states like that, we still end up with a symbolic ER index. We probably are going to need to pass the state into the store to get around this.

d4e15ef9-ffb5-4988-bc6f-d8a3b3e1d54a commented 14 years ago

We can explicitly split up states when loading from an array with a symbolic index. For this example, we can assume 'number' equal to 0, 1, 2, 3, 4 respectively, and load from where it is feasible. Here only case 0 is feasible.

This split up can happen in either GRExprEngine or StoreManager. I'm inclined to let it happen in GRExprEngine, and be controlled by an option, say '-analyzer-split-symbolic-index'.

The reasons are that we can keep store manager simple. It only reasons about concrete index. And this is analysis logic and such 'intelligence' should happen in GRExprEngine.

tkremenek commented 14 years ago

assigned to @tkremenek

llvm / llvm-project

(follow-up PR 8015): false null dereference because of inaccurate reasoning of perfectly constrained array indices #8429

Extended Description