Closed bellini666 closed 1 year ago
Here's a more minimal repro:
def func1(x: str | None):
assert x is not None
for i in range(10):
for j in range(10):
x = x + ""
This is a regression introduced by the fix for #4023. I walk through the differences in behavior before and after that fix below in gory detail.
TL/DR: I believe the root cause of this issue is that when the TypeEvaluator
is determining the type of x + ""
and calls back into the CodeFlowEngine
, getTypeFromLoopFlowNode
now returns undefined
as the type of x
rather than str
. This is due to a change in when we evaluate the code flow node for x = x + ""
-- which is called Assign@8416[6:17]
in the graph below. Returning undefined
from getFlowTypeOfReference
causes getTypeOfName
to fall back to using the annotated type Union[str, None]
. Everything goes downhill from there.
At the moment I'm uncertain how to fix this in a way that does not break #4023. But I wanted to document what I know so far in case others have suggestions.
getTypeFromCodeFlow@8415: x[6:21]
Assign@8415[5:13] ─ Loop@8412 ┬ Call@8411[5:18] ─── Assign@8410[4:9] ──────────── Loop@8407 ┬ Call@8406[4:14] ───── True@8404[2:12] ─ Assign@8400[1:11] ─ Start@8399
│ ╰ Circular(Loop@8412)
╰ Assign@8416[6:17] ─ Circular(Assign@8415[5:13])
Here's what I'm seeing:
The CodeFlowEngine
first follows the "top" path in the graph (i.e. the first antecedents of both loops) and determines that the type of x
, after being narrowed by the assert
, is str
. So far so good.
Then we move on to analyzing the second antecedent of Loop@8407
, which is Circular(Loop@8412)
. At this point the cache entry for Loop@8412
looks like the following because we're still in the process of evaluating its first antecedent (Call@8411[5:18]
) and we haven't looked at its second antecedent (Assign@8416[6:17]
) at all yet:
type = undefined, isIncomplete = true, generationCount = 2, incompleteSubtypes =
[ type = undefined, isIncomplete = true, isPending = true, evaluationCount = 0 ] # incompleteSubtype for Call@8411[5:18]
What happens after this point differs based on whether you have the fix for #4023 or not.
getTypeFromLoopFlowNode
for Circular(Loop@8412)
exits early because (as shown in the cache entry above) at least one of the incompleteSubtypes
has isPending = true
. Note that after the fix for 4023, we would not exit early here because we don't have incompleteSubtypes
for all of the antecedents yet. getTypeFromLoopFlowNode
returns type = undefined, isIncomplete = true
.Loop@8407
and therefore getTypeFromLoopFlowNode
for Loop@8407
returns an effectiveType
of str
.
type = str, isIncomplete = true, incompleteSubtypes =
[type = str, isIncomplete = false] # incompleteSubtype for Call@8406[4:14]
[type = undefined, isIncomplete = true] # incompleteSubtype for Circular(Loop@8412)
str
is then passed back down the "top" path of the graph and is therefore also the cached type of the first antecedent of Loop@8412
, which is Call@8411[5:18]
.getTypeForFlowNode
on the second antecedent of Loop@8412
which is Assign@8416[6:17]
. Evaluating this assignment's type (via evaluateAssignmentFlowNode
) causes us to enter the TypeEvaluator
via evaluateTypesForStatement
and eventually call back into getTypeForFlowNode
for Assign@8415[5:13]
.getTypeFromLoopFlowNode
for Loop@8412
, we use the following cache entry:
type = str, isIncomplete = true, generationCount = 2, incompleteSubtypes =
[type = str, isIncomplete = false, isPending = false, evaluationCount = 1] # incompleteSubtype for Call@8411[5:18]
[type = undefined, isIncomplete = true, isPending = true, evaluationCount = 0] # incompleteSubtype for Assign@8416[6:17]
incompleteSubtypes
has isPending = true
and cacheEntry.type
is str
, getTypeFromLoopFlowNode
returns type = str, isIncomplete = true
str
is then passed back through the TypeEvaluator
to the FlowFlags.Assignment
logic in getTypeFromFlowNode
where it is saved as the cache entry on Assign@8416[6:17]
getTypeFromLoopFlowNode
antecedent loop for Loop@8412
, the cache entry now looks like this:
type = str, isIncomplete = true, incompleteSubtypes =
[type = str, isIncomplete = false] # incompleteSubtype for Call@8411[5:18]
[type = str, isIncomplete = true] # incompleteSubtype for Assign@8416[6:17]
maxAttemptCount
and the final result is the effectiveType
of str
.Loop@8407
after narrowing x
via the assert
, the cache entry for Loop@8412
looks like the following because we're still in the process of evaluating its first antecedent (Call@8411[5:18]
) and we haven't looked at its second antecedent (Assign@8416[6:17]
) at all yet:
type = undefined, isIncomplete = true, generationCount = 2, incompleteSubtypes =
[ type = undefined, isIncomplete = true, isPending = true, evaluationCount = 0 ] # incompleteSubtype for Call@8411[5:18]
incompleteSubtype
but two antecedents, getTypeFromLoopFlowNode
for Loop@8412
does not exit early. Instead, it evaluates the antecedents of Loop@8412
which causes us to call getTypeForFlowNode
on Assign@8416[6:17]
earlier than before.TypeEvaluator
to evaluate the type of the assignment expression, and we eventually call back into the CodeFlowEngine
, calling getTypeForFlowNode
for Assign@8415[5:13]
.getTypeFromLoopFlowNode
for Loop@8412
, using the following cache entry. Note that although earlier we narrowed the declared type to str
(via the assert
), that information has not made it into the incompleteSubtypes
for Loop@8412
yet:
type = undefined, isIncomplete = true, generationCount = 2, incompleteSubtypes =
[ type = undefined, isIncomplete = true, isPending = true, evaluationCount = 0 ] # incompleteSubtype for Call@8411[5:18]
[ type = undefined, isIncomplete = true, isPending = true, evaluationCount = 0 ] # incompleteSubtype for Assign@8416[6:17]
incompleteSubtypes
(one for each antecedent) and at least one has isPending = true
, we return early. getTypeFromLoopFlowNode
returns type = undefined, isIncomplete = true
.type = undefined
, getTypeOfName
falls back to using the annotated type of x
which is Union[str, None]
. I'd love to link to that line of code, but TypeEvaluator.ts
is too large for GitHub to handle nicely.Union[str, None]
, gets passed back through the TypeEvaluator
to the FlowFlags.Assignment
logic in getTypeFromFlowNode
where it is saved as the cache entry on Assign@8416[6:17]
.Loop@8412
until hitting maxAttemptCount
at which point the effectiveType
is Union[str, None]
because the cache entry is:
type = Union[str, None], isIncomplete = true, incompleteSubtypes =
[ type = undefined, isIncomplete = true ] # incompleteSubtype for Call@8411[5:18]
[ type = Union[str, None], isIncomplete = true ] # incompleteSubtype for Assign@8416[6:17]
Loop@8412
because it is the second antecedent of Loop@8407
. The cache entry for Loop@8407
now looks like this, resulting in an effective type of Union[str, None]
(str
combined with Union[str, None]
):
type = Union[str, None], isIncomplete = true, incompleteSubtypes =
[ type = str, isIncomplete = false ] # incompleteSubtype for Call@8406[4:14]
[ type = Union[str, None], isIncomplete = false ] # incompleteSubtype for Circular(Loop@8412)
Union[str, None]
is now passed back to Loop@8407
down the "top" path of the code flow graph resulting in the following cache entry for Loop@8412
:
type = Union[str, None], isIncomplete = true, incompleteSubtypes =
[ type = Union[str, None], isIncomplete = false ] # incompleteSubtype for Call@8411[5:18]
[ type = Union[str, None], isIncomplete = true ] # incompleteSubtype for Assign@8416[6:17]
maxAttemptCount
and grab the effectiveType
of this data which is, of course, Union[str, None]
.Could you try both algorithms and return the type that is more constrained between the two? Or does that still break the bug that was fixed in #4023?
Could you try both algorithms...
I'm concerned that that would make the CodeFlowEngine
even harder to understand. It's already very complicated.
I haven't dug into #4023, so I'm not sure which of the behavior changes I mentioned above it is relying on. Maybe that's what I should do next.
There are a couple things that confuse me about #4023:
return
line we know that closest
is Calculation | None
, it's not clear to me how getTypeFromLoopFlowNode
is connected with us evaluating the type of (closest.numerator, closest.denominator)
. I would think that we'd use the CodeFlowEngine
at that point to tell us that closest
is not None
in the else
block. But it wouldn't involve getTypeFromLoopFlowNode
.No, we wouldn't use both algorithms. That would require not only duplicate code but also duplicate caches. We should find one algorithm that works.
I'll take a look at the problem when I have time. Thanks Erik for the above analysis and notes.
I have a weird behaviour with nested loops as well:
class A:
def foo(self):
...
def f(fn: Callable | None):
d: dict[int, A] = {}
for _ in range(1, 10):
for _ in range(1, 10):
if fn is not None:
key = fn(
1
) # Object of type "None" cannot be called # but if i comment the `item.foo()` line this passes
item = d[key]
item.foo()
Is it the same issue ?
Is it the same issue ?
Yes, this regression was caused by the same change.
When evaluating the type of fn
at the call, we see the is not None
and attempt to use it for narrowing. However, getTypeFromFlowNode(conditionalFlowNode.antecedent)
returns undefined
so no narrowing occurs. Before #4023, we got Union[Callable, None]
here and narrowed it to Callable
.
I added reveal_type(fn)
above the fn(1)
call, and that gives the expected/narrowed type, but I'm not sure why.
This will be addressed in the next release.
This is included in pyright 1.1.286, which I just published. It will also be included in a future version of pylance.
After updating to 1.1.275, one of my codes started producing an error. Don't know exactly how to describe this, but here is a sample code that reproduces it.
The pyright output:
Some notes: 1) If I remove the
| None
from the x argument, the error doesn't happen, even though it would not make a difference because of the first line in each func2) The error starts happening after 2 levels of for loops. A single loop doesn't trigger the issue
The
test_with_some_type
is a simplified version of what I found in my code. Thetest_with_str
was to test if the issue was specifically with the usage of__or__
with custom classes or if it affected simpler types as well, which it does.