Closed ldouteau closed 8 months ago
Hello,
I have the same problem, but I am on debian 12.
How do i know it's pylance, when i disable pylance and reload, the code is suddenly reachable.
My version of python is 3.12.1 in an environment conda. And my version of pandas is : pandas 2.2.1 py312h526ad5a_0
Hello same here with pylance v2024.3.1
each pandas.concat
call make the following lines unreachable.
Version: 1.87.2 (user setup)
Commit: 863d2581ecda6849923a2118d93a088b0745d9d6
Date: 2024-03-08T15:20:17.278Z
Electron: 27.3.2
ElectronBuildId: 26836302
Chromium: 118.0.5993.159
Node.js: 18.17.1
V8: 11.8.172.18-electron.0
OS: Windows_NT x64 10.0.22621
Have a duplicate report here with additional details possibly: https://github.com/microsoft/pylance-release/issues/5631
The matching overload of concat
in pandas-stubs
currently returns Never
, so Pylance is behaving correctly here. https://github.com/pandas-dev/pandas-stubs/blob/main/pandas-stubs/core/reshape/concat.pyi#L39
This overload was added in https://github.com/pandas-dev/pandas-stubs/pull/858.
Please file an issue against pandas-stubs
at https://github.com/pandas-dev/pandas-stubs/issues.
Btw, in this case there are multiple matching overloads because the input parameter is list[Any]
. Following our overload matching rules we choose the first matching overload, which in this case is the new Never
overload.
They seemt to be abusing Never
in this case. I believe they want it to mean "not a legal call" whereas it really means "this call never returns".
Btw, while waiting for a fix, you could work around the pandas-stubs
issue by commenting out the Never
overload in ...\.vscode\extensions\ms-python.vscode-pylance-2024.3.1\dist\bundled\stubs\pandas\core\reshape\concat.pyi
.
@overload
def concat(
objs: Iterable[None] | Mapping[HashableT1, None],
*,
axis: Axis = ...,
join: Literal["inner", "outer"] = ...,
ignore_index: bool = ...,
keys: Iterable[HashableT2] = ...,
levels: Sequence[list[HashableT3] | tuple[HashableT3, ...]] = ...,
names: list[HashableT4] = ...,
verify_integrity: bool = ...,
sort: bool = ...,
copy: bool = ...,
) -> Never: ...
Thanks for the explanation @debonte, I've submitted an issue to to Pandas devs here: https://github.com/pandas-dev/pandas-stubs/issues/888
Note - both of the calls below will raise an exception with pandas, so in the example by the OP, pylance is absolutely correct in terms of what is reported!
pd.concat()
pd.concat([])
Note - both of the calls below will raise an exception with pandas, so in the example by the OP, pylance is absolutely correct in terms of what is reported!
pd.concat() pd.concat([])
FYI Pylance (through pandas-stubs) doesn't mark code as unreachable after the first code, only after the second one. About the relevance of the test case, sure it's a minimal example that doesnt make much sense.
Please see #5640 . pylance is doing the unreachable analysis by looking at the stubs even if type checking is off.
FYI Pylance (through pandas-stubs) doesn't mark code as unreachable after the first code, only after the second one. About the relevance of the test case, sure it's a minimal example that doesnt make much sense.
I'm guessing that you had type checking turned off. Because if you turned it on, the first line would be flagged as a typing error.
The second line will raise an exception, so then code below it is unreachable.
What's debatable here is whether there should be some control over whether you can turn on/off the unreachable analysis, or whether pylance should be doing the unreachable analysis when you are not using type checking.
Why is this (and #5631) closed? Is it fixed? If it is not, which open issue will fix the question?
I will refer to #5631, but since that is duplicated to this one, the arguments should be relevant here.
Pylance says it is ok to decide that it is unreachable code because pandas-stubs starts checking if all list is None then it will fail. So pylance is correct. Pandas-stubs says it is ok to check if all list elements are None, then this will obviously fail. They claim that pylance shouldn't state that "Any" means all-None.
I've been in both sides while reading the issues but now I feel like pandas-stubs are right and the issue is that if we have Any there is a possibility that we end with a unreachable code (all-None), but it isn't guaranteed. You can receive a list including some DataFrame. Don't you? For instance, the #5631 question. Ok, #5631's OP could state that generate_series will return a pandas.Series, which it does, but it's not always that easy.
from typing import Any
import pandas as pd
def generate_series() -> Any:
return pd.Series([1, 2, 3])
df = pd.concat({k: generate_series() for k in range(10)}, axis=1)
print("Unreachable?!")
Thanks!
Ok, #5631's OP could state that generate_series will return a pandas.Series, which it does, but it's not always that easy.
I agree with the aspect of it not always being that easy. You could have a 3rd party library that is fully untyped (or improperly typed!) where the type checker cannot infer the result of a method or function from that library to be a Series
or DataFrame
(or an Iterable
of either). So you have code that calls that 3rd party library and now the type checker thinks the result is Iterable[Any]
, even though you "know" it is Iterable[Series]
, so your only solution is to cast
in order for the call to pd.concat()
to work.
Why is this closed?
Pyright and pylance are working as intended here (and in compliance with the Python type system). There's nothing actionable for the pylance or pyright teams, so it's appropriate to close this issue. The behavior you're seeing is due to the way the pandas stubs are using NoReturn
in overloads. I recommend that the maintainers of these stubs reconsider the way they're using this part of the Python typing system. If necessary, we can discuss ways to extend the type system to better support their needs.
Pyright and pylance are working as intended here (and in compliance with the Python type system).
Is "reachability" part of the python type system? I don't see that in the link you provided.
Remember, with the stubs as they are in pandas-stubs
, the code at https://github.com/microsoft/pylance-release/issues/5630#issuecomment-1998039356 passes fine with pyright
on command line. So the stubs are correct with respect to both mypy
and pyright
as command line type checkers. So I would argue that the stubs are correct with respect to formal type checking.
You're saying they are incorrect in the context of doing "reachability analysis".
The issue here is the assumptions about reachability inferred by pyright
in the context of pylance
and VSCode, and the lack of control for users to turn off the reachability analysis.
the code at #5630 (comment) passes fine with
pyright
on command line
@Dr-Irv, please see my recent reply on the pandas-stubs issue. While pyright doesn't emit any diagnostics about unreachability, the Never
overload effectively causes false negatives in the code after the concat
call. https://github.com/pandas-dev/pandas-stubs/issues/888#issuecomment-1998112699
@Dr-Irv, please see my recent reply on the pandas-stubs issue. While pyright doesn't emit any diagnostics about unreachability, the
Never
overload effectively causes false negatives in the code after theconcat
call. pandas-dev/pandas-stubs#888 (comment)
@debonte and see my response at https://github.com/pandas-dev/pandas-stubs/issues/888#issuecomment-1998148761
Command line pyright does not exhibit what you show (and neither does mypy).
The fundamental issue is about the reachability analysis done within pylance/VSCode, which is why I opened #5640
As I've already said, correct type evaluation requires reachability analysis. One cannot be done without the other. You can't just "turn off reachability analysis" and expect to get correct types.
As I've already said, correct type evaluation requires reachability analysis. One cannot be done without the other. You can't just "turn off reachability analysis" and expect to get correct types.
But if a user turns off type analysis in VS Code, then shouldn't the reachability analysis also be turned off if one can't be done without the other?
There is no way to turn off type analysis in Pylance. All functionality within Pylance (completion suggestions, hover text, signature help, semantic tokens, etc.) relies on type analysis. If you set typeCheckingMode
to "off", that simply suppresses certain classes of diagnostics. Reachability analysis and type analysis go hand in hand.
There is no way to turn off type analysis in Pylance. All functionality within Pylance (completion suggestions, hover text, signature help, semantic tokens, etc.) relies on type analysis. If you set
typeCheckingMode
to "off", that simply suppresses certain classes of diagnostics. Reachability analysis and type analysis go hand in hand.
OK, that makes sense. Maybe the option that is needed is "ignore stubs included with pylance" or "ignore specific packaged stubs included with pylance" ?? At least that would give people the option to temporarily suppress the issue if any stub package like pandas-stubs
had an error like this one.
Maybe the option that is needed is "ignore stubs included with pylance" or "ignore specific packaged stubs included with pylance" ?? At least that would give people the option to temporarily suppress the issue if any stub package like
pandas-stubs
had an error like this one.
That seems excessive to me. I suspect that a small percentage of our users have sufficient understanding of Python typing, stubs, Pylance/Pyright, etc. to know whether such a command would help in any particular scenario. Those that do would likely know enough to work around the issue in their code or in the stubs, thereby retaining the benefit of the unbroken/non-buggy portion of the stubs rather than throwing the baby out with the bath water. In the case of https://github.com/pandas-dev/pandas-stubs/issues/888, they could cast the parameter passed to concat
or remove the Never
overload from the stubs. Worst-case, this type of advanced user could easily find the bundled stubs in our install directory and delete them manually.
Btw, you pointed out that Pylance shipped with the stubs from your main
branch rather than your most recent release. It sounded like you might have been frustrated that we were shipping prerelease stubs. To be honest, that hadn't occurred to me before. Currently we just grab whatever's in main
once a week. Would you recommend that we always ship the latest official release instead?
Btw, you pointed out that Pylance shipped with the stubs from your
main
branch rather than your most recent release. It sounded like you might have been frustrated that we were shipping prerelease stubs. To be honest, that hadn't occurred to me before. Currently we just grab whatever's inmain
once a week. Would you recommend that we always ship the latest official release instead?
Whatever is in main
has been tested with our CI and would go in the "next" release whenever I decide to do that release. So pulling once a week is fine. In fact, in this case, it helped us identify an issue that we can fix quickly (which I think we will do, based the latest remarks in the pandas-stubs issue). It would be worthwhile if you could do something (you could add a Literal
to pandas-stubs/_version.pyi
) that indicated which commit you pulled from in case we have to figure out the difference again.
This worked like a charm: https://github.com/microsoft/pylance-release/issues/5630#issuecomment-1994616891
I think this is better than adding a flag saying "I don't want pylance using pandas-stub", so I wouldn't add that option.
Thanks for your great support to both teams, pylance(-release) and pandas(-stubs). I love your projects!
Hi guys is this fixed? Pandas concat messes up my editor viewing.
Hi guys is this fixed? Pandas concat messes up my editor viewing.
It has been fixed in pandas-stubs
. We will pick up that fix in our next prerelease build, which will likely ship on Wednesday.
In the meantime you can use the workaround I mentioned above.
A naive solution is to define my_concat function that returns pd.concat. Something like:
def my_concat(dfs_list, axis=0): return pd.concat(dfs_list, axis=axis)
df = my_concat([df_1, df_2], axis=0)
Environment data
Code Snippet
Repro Steps
Expected behavior
Code flagged as unreachable, and thus greyed out
Actual behavior
Code should be reachable
Logs
I'm not sure what i should provide here, or if it's relevant. Let me know if more is required