microsoft / pylance-release

Documentation and issues for Pylance
Creative Commons Attribution 4.0 International
1.72k stars 766 forks source link

Code marked as unreachable after pandas concat #5630

Closed ldouteau closed 8 months ago

ldouteau commented 8 months ago

Environment data

Code Snippet

import pandas as pd

a = 1
pd.concat()
a = 1
pd.concat([])
a = 1

Repro Steps

  1. Open this piece of code
  2. Code at line 7 is flagged as unreachable by Pylance

Expected behavior

Code flagged as unreachable, and thus greyed out

Actual behavior

Code should be reachable

Logs

I'm not sure what i should provide here, or if it's relevant. Let me know if more is required

ArthurPERE commented 8 months ago

Hello,

I have the same problem, but I am on debian 12.

How do i know it's pylance, when i disable pylance and reload, the code is suddenly reachable.

My version of python is 3.12.1 in an environment conda. And my version of pandas is : pandas 2.2.1 py312h526ad5a_0

clementlefevre commented 8 months ago

Hello same here with pylance v2024.3.1

each pandas.concat call make the following lines unreachable.

Version: 1.87.2 (user setup)
Commit: 863d2581ecda6849923a2118d93a088b0745d9d6
Date: 2024-03-08T15:20:17.278Z
Electron: 27.3.2
ElectronBuildId: 26836302
Chromium: 118.0.5993.159
Node.js: 18.17.1
V8: 11.8.172.18-electron.0
OS: Windows_NT x64 10.0.22621
torext commented 8 months ago

Have a duplicate report here with additional details possibly: https://github.com/microsoft/pylance-release/issues/5631

debonte commented 8 months ago

The matching overload of concat in pandas-stubs currently returns Never, so Pylance is behaving correctly here. https://github.com/pandas-dev/pandas-stubs/blob/main/pandas-stubs/core/reshape/concat.pyi#L39

This overload was added in https://github.com/pandas-dev/pandas-stubs/pull/858.

Please file an issue against pandas-stubs at https://github.com/pandas-dev/pandas-stubs/issues.

Btw, in this case there are multiple matching overloads because the input parameter is list[Any]. Following our overload matching rules we choose the first matching overload, which in this case is the new Never overload.

They seemt to be abusing Never in this case. I believe they want it to mean "not a legal call" whereas it really means "this call never returns".

debonte commented 8 months ago

Btw, while waiting for a fix, you could work around the pandas-stubs issue by commenting out the Never overload in ...\.vscode\extensions\ms-python.vscode-pylance-2024.3.1\dist\bundled\stubs\pandas\core\reshape\concat.pyi.

@overload
def concat(
    objs: Iterable[None] | Mapping[HashableT1, None],
    *,
    axis: Axis = ...,
    join: Literal["inner", "outer"] = ...,
    ignore_index: bool = ...,
    keys: Iterable[HashableT2] = ...,
    levels: Sequence[list[HashableT3] | tuple[HashableT3, ...]] = ...,
    names: list[HashableT4] = ...,
    verify_integrity: bool = ...,
    sort: bool = ...,
    copy: bool = ...,
) -> Never: ...
torext commented 8 months ago

Thanks for the explanation @debonte, I've submitted an issue to to Pandas devs here: https://github.com/pandas-dev/pandas-stubs/issues/888

Dr-Irv commented 8 months ago

Note - both of the calls below will raise an exception with pandas, so in the example by the OP, pylance is absolutely correct in terms of what is reported!

pd.concat()
pd.concat([])
ldouteau commented 8 months ago

Note - both of the calls below will raise an exception with pandas, so in the example by the OP, pylance is absolutely correct in terms of what is reported!

pd.concat()
pd.concat([])

FYI Pylance (through pandas-stubs) doesn't mark code as unreachable after the first code, only after the second one. About the relevance of the test case, sure it's a minimal example that doesnt make much sense.

Dr-Irv commented 8 months ago

Please see #5640 . pylance is doing the unreachable analysis by looking at the stubs even if type checking is off.

Dr-Irv commented 8 months ago

FYI Pylance (through pandas-stubs) doesn't mark code as unreachable after the first code, only after the second one. About the relevance of the test case, sure it's a minimal example that doesnt make much sense.

I'm guessing that you had type checking turned off. Because if you turned it on, the first line would be flagged as a typing error.

The second line will raise an exception, so then code below it is unreachable.

What's debatable here is whether there should be some control over whether you can turn on/off the unreachable analysis, or whether pylance should be doing the unreachable analysis when you are not using type checking.

zorion commented 8 months ago

Why is this (and #5631) closed? Is it fixed? If it is not, which open issue will fix the question?

I will refer to #5631, but since that is duplicated to this one, the arguments should be relevant here.

Pylance says it is ok to decide that it is unreachable code because pandas-stubs starts checking if all list is None then it will fail. So pylance is correct. Pandas-stubs says it is ok to check if all list elements are None, then this will obviously fail. They claim that pylance shouldn't state that "Any" means all-None.

I've been in both sides while reading the issues but now I feel like pandas-stubs are right and the issue is that if we have Any there is a possibility that we end with a unreachable code (all-None), but it isn't guaranteed. You can receive a list including some DataFrame. Don't you? For instance, the #5631 question. Ok, #5631's OP could state that generate_series will return a pandas.Series, which it does, but it's not always that easy.

from typing import Any

import pandas as pd

def generate_series() -> Any:
    return pd.Series([1, 2, 3])

df = pd.concat({k: generate_series() for k in range(10)}, axis=1)

print("Unreachable?!")

Thanks!

Dr-Irv commented 8 months ago

Ok, #5631's OP could state that generate_series will return a pandas.Series, which it does, but it's not always that easy.

I agree with the aspect of it not always being that easy. You could have a 3rd party library that is fully untyped (or improperly typed!) where the type checker cannot infer the result of a method or function from that library to be a Series or DataFrame (or an Iterable of either). So you have code that calls that 3rd party library and now the type checker thinks the result is Iterable[Any], even though you "know" it is Iterable[Series], so your only solution is to cast in order for the call to pd.concat() to work.

erictraut commented 8 months ago

Why is this closed?

Pyright and pylance are working as intended here (and in compliance with the Python type system). There's nothing actionable for the pylance or pyright teams, so it's appropriate to close this issue. The behavior you're seeing is due to the way the pandas stubs are using NoReturn in overloads. I recommend that the maintainers of these stubs reconsider the way they're using this part of the Python typing system. If necessary, we can discuss ways to extend the type system to better support their needs.

Dr-Irv commented 8 months ago

Pyright and pylance are working as intended here (and in compliance with the Python type system).

Is "reachability" part of the python type system? I don't see that in the link you provided.

Remember, with the stubs as they are in pandas-stubs, the code at https://github.com/microsoft/pylance-release/issues/5630#issuecomment-1998039356 passes fine with pyright on command line. So the stubs are correct with respect to both mypy and pyright as command line type checkers. So I would argue that the stubs are correct with respect to formal type checking.

You're saying they are incorrect in the context of doing "reachability analysis".

The issue here is the assumptions about reachability inferred by pyright in the context of pylance and VSCode, and the lack of control for users to turn off the reachability analysis.

debonte commented 8 months ago

the code at #5630 (comment) passes fine with pyright on command line

@Dr-Irv, please see my recent reply on the pandas-stubs issue. While pyright doesn't emit any diagnostics about unreachability, the Never overload effectively causes false negatives in the code after the concat call. https://github.com/pandas-dev/pandas-stubs/issues/888#issuecomment-1998112699

Dr-Irv commented 8 months ago

@Dr-Irv, please see my recent reply on the pandas-stubs issue. While pyright doesn't emit any diagnostics about unreachability, the Never overload effectively causes false negatives in the code after the concat call. pandas-dev/pandas-stubs#888 (comment)

@debonte and see my response at https://github.com/pandas-dev/pandas-stubs/issues/888#issuecomment-1998148761

Command line pyright does not exhibit what you show (and neither does mypy).

The fundamental issue is about the reachability analysis done within pylance/VSCode, which is why I opened #5640

erictraut commented 8 months ago

As I've already said, correct type evaluation requires reachability analysis. One cannot be done without the other. You can't just "turn off reachability analysis" and expect to get correct types.

Dr-Irv commented 8 months ago

As I've already said, correct type evaluation requires reachability analysis. One cannot be done without the other. You can't just "turn off reachability analysis" and expect to get correct types.

But if a user turns off type analysis in VS Code, then shouldn't the reachability analysis also be turned off if one can't be done without the other?

erictraut commented 8 months ago

There is no way to turn off type analysis in Pylance. All functionality within Pylance (completion suggestions, hover text, signature help, semantic tokens, etc.) relies on type analysis. If you set typeCheckingMode to "off", that simply suppresses certain classes of diagnostics. Reachability analysis and type analysis go hand in hand.

Dr-Irv commented 8 months ago

There is no way to turn off type analysis in Pylance. All functionality within Pylance (completion suggestions, hover text, signature help, semantic tokens, etc.) relies on type analysis. If you set typeCheckingMode to "off", that simply suppresses certain classes of diagnostics. Reachability analysis and type analysis go hand in hand.

OK, that makes sense. Maybe the option that is needed is "ignore stubs included with pylance" or "ignore specific packaged stubs included with pylance" ?? At least that would give people the option to temporarily suppress the issue if any stub package like pandas-stubs had an error like this one.

debonte commented 8 months ago

Maybe the option that is needed is "ignore stubs included with pylance" or "ignore specific packaged stubs included with pylance" ?? At least that would give people the option to temporarily suppress the issue if any stub package like pandas-stubs had an error like this one.

That seems excessive to me. I suspect that a small percentage of our users have sufficient understanding of Python typing, stubs, Pylance/Pyright, etc. to know whether such a command would help in any particular scenario. Those that do would likely know enough to work around the issue in their code or in the stubs, thereby retaining the benefit of the unbroken/non-buggy portion of the stubs rather than throwing the baby out with the bath water. In the case of https://github.com/pandas-dev/pandas-stubs/issues/888, they could cast the parameter passed to concat or remove the Never overload from the stubs. Worst-case, this type of advanced user could easily find the bundled stubs in our install directory and delete them manually.

Btw, you pointed out that Pylance shipped with the stubs from your main branch rather than your most recent release. It sounded like you might have been frustrated that we were shipping prerelease stubs. To be honest, that hadn't occurred to me before. Currently we just grab whatever's in main once a week. Would you recommend that we always ship the latest official release instead?

Dr-Irv commented 8 months ago

Btw, you pointed out that Pylance shipped with the stubs from your main branch rather than your most recent release. It sounded like you might have been frustrated that we were shipping prerelease stubs. To be honest, that hadn't occurred to me before. Currently we just grab whatever's in main once a week. Would you recommend that we always ship the latest official release instead?

Whatever is in main has been tested with our CI and would go in the "next" release whenever I decide to do that release. So pulling once a week is fine. In fact, in this case, it helped us identify an issue that we can fix quickly (which I think we will do, based the latest remarks in the pandas-stubs issue). It would be worthwhile if you could do something (you could add a Literal to pandas-stubs/_version.pyi) that indicated which commit you pulled from in case we have to figure out the difference again.

zorion commented 8 months ago

This worked like a charm: https://github.com/microsoft/pylance-release/issues/5630#issuecomment-1994616891

I think this is better than adding a flag saying "I don't want pylance using pandas-stub", so I wouldn't add that option.

Thanks for your great support to both teams, pylance(-release) and pandas(-stubs). I love your projects!

rcyost commented 7 months ago

Hi guys is this fixed? Pandas concat messes up my editor viewing.

debonte commented 7 months ago

Hi guys is this fixed? Pandas concat messes up my editor viewing.

It has been fixed in pandas-stubs. We will pick up that fix in our next prerelease build, which will likely ship on Wednesday.

In the meantime you can use the workaround I mentioned above.

ahmedmohammed107 commented 7 months ago

A naive solution is to define my_concat function that returns pd.concat. Something like:

def my_concat(dfs_list, axis=0): return pd.concat(dfs_list, axis=axis)

df = my_concat([df_1, df_2], axis=0)

debonte commented 7 months ago

This issue is also fixed in our most recent stable release -- 2024.3.2