Open Dr-Irv opened 2 years ago
Example of an approach that is suggested was developed within the pandas-stubs project based on various code snippets which are then tested with
mypy
: https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets
see #40202 for a less verbose but more primitive approach.
Using pytest is perhaps more robust. How easy is is to parametrize testing of types? I assume that parametrization cannot be used for static checks.
The method in #40202 has several issues. If anyone knows of other approaches to testing types, would much appreciate posting suggestions here.
from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-795403039
we must also remember that there is also the "do nothing" option and not explicitly test the types. IIRC this was suggested in the January dev meeting (Mostly relying on user feedback to report issues.)
repeating here so that this option is not overlooked if the solution gets too complex.
One potentially easier to implement option is to have something like mypy_primer but for pandas. Typeshed uses mypy_primer to automatically comment on each PR whether it would change type annotations as inferred by mypy.
"Pandas_primer" would help to 1) prevent involuntary changes to public type annotations and to 2) verify that intended changes actually have an effect.
In pandas-stubs, we've decided to go with the simplest method, which doesn't require any extra setup or testing constructs. Here are some of the other tools we've looked at (not including dynamic type-checkers):
Coming back to our solution, there's a way to test negative scenarios in our approach but it's far from elegant:
rdf: pd.Series = pd.concat({1: df, 2: df2}) # type: ignore
This will cause mypy
not to return errors when the line in-fact is incorrect,
whereas using it incorrectly
rdf: pd.DataFrame = pd.concat({1: df, 2: df2}) # type: ignore
will result in an error
tests/snippets/test_pandas.py:55: error: Unused "type: ignore" comment
The issue that remains is that our whole approach is static and I don't see it being easily integrated with parametric list of use-cases.
mypy_primer seems like an another interesting approach. Potentially difficult to integrated into pandas CI but might be worth the extra effort.
Overall it seems like either pytest-mypy-testing/ @simonjayhawkins 's (with some work) or our (pandas-stubs) solution might be a semi-immediate fit for pandas use-case.
@simonjayhawkins if I may ask: Why was your work discontinued? Was it just inconvenient at the time?
Here is why I like the approach taken by the pandas-stubs
project: https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets
mypy
test#type: ignore
comments and use the mypy warn_unused_ignores
option to make sure that the ignore
line is correct (and tested with future changes to the stubs)From my experience in submitting a bunch of PR's to the Microsoft python-type-stubs
project, which has no testing for the pandas stubs, I can tell you that I am always concerned that a fix I submit for the stubs will break something else. If they had included something like what the pandas-stubs
project did, it would make life a lot easier to do more test-driven development of the stubs. This is especially true when you make a change in pandas/_typing.py
that changes one of the top-level types. It's hard to know if you have then made some downstream stub too narrow or too wide.
The approach we took in pandas-stubs was the most pragmatical for our usage and capacity we have to maintain the project. We can assume that any stubs will not be perfect and when people will first start using them, they will get some errors (I applied our stubs for one of the thousand lines project and made 10 fixes to pandas-stubs based on that only). So whenever this happens, we would first re-create the false negative case provided, modify the stubs and make sure the exact usage is corrected. From this moment, this particular usage is guarded so people using the project shouldn't notice regression and we can add only small improvements at once.
@simonjayhawkins if I may ask: Why was your work discontinued? Was it just inconvenient at the time?
abandoned due to https://github.com/pandas-dev/pandas/pull/40202#issuecomment-798695425. In pandas we tend not merge stuff if there are any objections.
I assumed from that position and the notes from the meeting...
Pandas-stubs can reorganize their test files to correspond better to current pandas testing
that the pandas-stubs would be integrated into out pytest tests, but I guess I got that wrong from not attending the meeting.
@Dr-Irv I assume from https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1009006773 that your objections to previous attempt no longer apply?
that the pandas-stubs would be integrated into out pytest tests, but I guess I got that wrong from not attending the meeting.
What we decided is that the testing approach from pandas-stubs
would be integrated, but we would use the Microsoft stubs as a starting point, because they are more complete than the pandas-stubs
.
@Dr-Irv I assume from #45252 (comment) that your objections to previous attempt no longer apply?
I'm not sure which objections you are referring to....
What I said there is that if we could annotate all of our existing tests (which means adding #type: ignore
statements in lots of places), that would be a way to test the stubs. But I think that would take a lot of effort.
Since then, I've seen the pandas-stubs
approach and I think it is easier to start from there, and expand the methodology in there to cover many use cases of any stubs we decide to publish.
I'm not sure which objections you are referring to....
specifically from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-798695425
I think it will be easier for the community to fix our source code that has typed methods for the public API (and add an appropriate test in pandas/tests) as opposed to using the reveal_type method proposed in this PR, which would require us to enumerate all the possibilities in two places - once in our regular testing code and once in the "testing types" code.
Since then, I've seen the
pandas-stubs
approach and I think it is easier to start from there, and expand the methodology in there to cover many use cases of any stubs we decide to publish.
IIUC the panda-stubs approach, if not using pytest, is not so different from #40202. i.e. we need to enumerate all the possibilities and have a separate set of tests.
also from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-798695425
In addition, it's not clear to me how the current proposal helps us test the variety of arguments that can be passed to methods. We know that our existing testing code already tries different argument types, so if we typed our testing code (and ran mypy on it), then we're getting the benefit of one set of tests that tests functionality as well as typing signatures.
The pandas-stubs testing approach, if not integrated into the current testing, also does not satisfy this.
I think it will be easier for the community to fix our source code that has typed methods for the public API (and add an appropriate test in pandas/tests) as opposed to using the reveal_type method proposed in this PR, which would require us to enumerate all the possibilities in two places - once in our regular testing code and once in the "testing types" code.
I was more referring to using the reveal_type()
aspect of the testing.
IIUC the panda-stubs approach, if not using pytest, is not so different from #40202. i.e. we need to enumerate all the possibilities and have a separate set of tests.
I don't think we have to enumerate all the possibilities. We can start with a set of tests that represent "common" usage, and if the stubs are not covering some particular use case (by being too wide or too narrow), we can just add to those stubs.
In addition, it's not clear to me how the current proposal helps us test the variety of arguments that can be passed to methods. We know that our existing testing code already tries different argument types, so if we typed our testing code (and ran mypy on it), then we're getting the benefit of one set of tests that tests functionality as well as typing signatures.
The pandas-stubs testing approach, if not integrated into the current testing, also does not satisfy this.
That's correct. But it's a first step forward. I would like it if we could put typing into our testing code, and just run mypy
on that, but it's a much larger effort that just starting with the pandas-stubs
testing approach.
just starting with the
pandas-stubs
testing approach.
+1
At the end of the day it's not so different than the previous attempt. We would at least have some testing and the tests in place (esp for the overloads).
changing those tests at a later date from say
i1: pd.Interval = pd.Interval(1, 2, closed='both')
to
reveal_type(pd.Interval(1, 2, closed='both')) # E: Interval
would not be difficult and would improve the robustness of the checks.
changing those tests at a later date from say
i1: pd.Interval = pd.Interval(1, 2, closed='both')
to
reveal_type(pd.Interval(1, 2, closed='both')) # E: Interval
would not be difficult and would improve the robustness of the checks.
If we were to go that route, since the notation used with reveal_type()
here is something we are borrowing from numpy
, we'd have to document it, etc., and it may make it more difficult for the community to make contributions.
If we were to go that route, since the notation used with
reveal_type()
here is something we are borrowing fromnumpy
, we'd have to document it, etc., and it may make it more difficult for the community to make contributions.
sure.
changing those tests at a later date ....
potential future possibility, not suggesting that we would do it, or that the discussion blocks the addition of static tests.
If the (i'll call it) assignment method works sufficiently well then may not need the additional complexity.
In https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1007988416, I said..
The method in #40202 has several issues.
We are using reveal_type so that we can check the return types explicitly. However, the problem is that, in the script we are using
if reveal not in expected_reveal
so IIRC there are cases which pass that shouldn't, since we are not checking equality to ensure an explicit check is made.
The "assignment method" is checking type compatibility only, so I imagine there could potentially be more cases that are not checked explicitly.
For instance, if we have var: typeA | typeB = func()
and func
changes from returning a union of the types to a single type of either typeA or typeB, then the tests would not catch this?
If we find that this less explicit checking is not an issue, then there would be no reason to make the testing more complex.
For instance, if we have
var: typeA | typeB = func()
andfunc
changes from returning a union of the types to a single type of either typeA or typeB, then the tests would not catch this?
You could catch it this way:
var: typeA | typeB = func()
varA: typeA = func() # type: ignore
varB: typeB = func() # type: ignore
If we use the warn_unused_ignores
argument of mypy, then the above code would pass. If the return type of func
is narrowed to typeA
without changing the tests, then the second line would generate the warning, indicating that the test needs to change.
If we find that this less explicit checking is not an issue, then there would be no reason to make the testing more complex.
I think this is just an issue of how complete we want the tests to be. If we try to make them fully complete, we'll never get done. So I say we start with a set of tests that include common usage of pandas, and expand those tests as issues get identified.
I think this is just an issue of how complete we want the tests to be. If we try to make them fully complete, we'll never get done. So I say we start with a set of tests that include common usage of pandas, and expand those tests as issues get identified.
exactly. The very reason #40202 was opened as an alternative to typing the tests in the first place!
from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-790452266
I don't think we need to mirror the source code structure for this as we are unlikely to test typing for internals. (although we could have an internals section if there are some methods that require complex overloading that would benefit from testing)
The structure (and scope) either needs more discussion if it is a concern, or be allowed to evolve over time.
If we try to make them fully complete, we'll never get done.
so could maybe skip the
Pandas-stubs can reorganize their test files to correspond better to current pandas testing
step mentioned in the meeting discussion and get these in sooner by copying across what they already have and iterate on that?
so could maybe skip the
Pandas-stubs can reorganize their test files to correspond better to current pandas testing
step mentioned in the meeting discussion and get these in sooner by copying across what they already have and iterate on that?
I'm pretty sure this was a suggestion from @jreback . Currently, there are 6 files there, and the question is whether it's worth the effort to split them up.
I think a file (x3) for each pandas public class (Series, DataFrame, Timestamp, Interval etc) and then a file for the top level functions (maybe split that one) and then somewhere for some internals (unit) testing, maybe a file or subdirectory.
I think a file (x3) for each pandas public class (Series, DataFrame, Timestamp, Interval etc) and then a file for the top level functions (maybe split that one) and then somewhere for some internals (unit) testing, maybe a file or subdirectory.
Can you explain the "(x3)" in the above? Are you saying there would be files each for Series
, DataFrame
, Timestamp
, etc.? If so, what would be different between the 3 files?
The x3 comes from pass/fail/reveal. Forget that for now to simplify the discussion.
So lets say one file per class, with 1 test function per method (or maybe 2 for pass/fail?)
There are some classes with so many methods that I think splitting the methods maybe noisy?
basically, what pandas-stubs already has lgtm. It's close enough to the suggestion above and we can evolve the structure/style once copied across.
@zkrolikowski-vl I prefer the structure for the tests that you already have so IMO https://github.com/VirtusLab/pandas-stubs/pull/127 is not necessary.
I think that keeping the structure more aligned to the public API is preferable (maybe easier for casual contributors) to mirroring either the source code or the pytest testing structure, but I defer to others on this since this was discussed in the meeting.
@simonjayhawkins I can see that both you and @Dr-Irv are agreeing to leave the structure as is. It's definitely easier to read right now. My only concern is that as more and more tests will be added the files would either become very long or would have to be split with semi-subjective names.
I'll close https://github.com/VirtusLab/pandas-stubs/pull/127 for now.
Do we have an idea on how to integrate the tests with the CI? Looks like pandas
is using both CircleCI and Github actions.
Do we have an idea on how to integrate the tests with the CI? Looks like
pandas
is using both CircleCI and Github actions.
My suggestion is as follows. Put your tests in pandas/tests/typing
. Then I think the current CI will automatically pick up running mypy on your test files. If you think we should use different mypy
options, then add a [[tool.mypy.overrides]]
section to pyproject.yml
and that allows you to override the default options for pandas.tests.typing
.
There is also a section at the bottom for pyright
testing, which doesn't go through pandas/tests
, but you could probably create an overrides
section there as well that forces pandas.tests.typing
to get checked.
Put your tests in
pandas/tests/typing
will pytest execute these? The reason I ask is I don't think we want that?
The static tests in https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets have functions named test_*
and filenames named test_*
, so putting them there means pytest will pick these up?
@Dr-Irv Will do.
@simonjayhawkins Yes. AFAIK pytest will pick them up as well. That's the way we've been testing if the snippets execute correctly.
There is also a section at the bottom for
pyright
testing, which doesn't go throughpandas/tests
, but you could probably create anoverrides
section there as well that forcespandas.tests.typing
to get checked.
We have very few options of pyright enabled at the moment (most notably reportGeneralTypeIssues
is disabled). If you want to run pyright (starting with only mypy is already a good start), the easiest is probably to add strict = ["pandas/tests/typing"]
to pyproject.toml and appending the path also to include
.
@simonjayhawkins Yes. AFAIK pytest will pick them up as well. That's the way we've been testing if the snippets execute correctly.
So the only issue here is that we will probably add tests for typing to test if a type is too wide, and put a #type: ignore
comment on them, but the code will then execute with pytest, and raise an exception, which means if pytest is run on it, we have to expect that exception. For example:
>>> s = pd.Series([1,2,3])
>>> s.corr(pd.Series([4,5,6]), method="foobar")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda3\envs\b38np122try2\lib\site-packages\pandas\core\series.py", line 2512, in corr
raise ValueError(
ValueError: method must be either 'pearson', 'spearman', 'kendall', or a callable, 'foobar' was supplied
So when we type Series.corr()
to have method
be a Union[Literal["pearson", "spearman", "kendall"], Callable]
, then you want the typing test to test that foobar
is invalid for method
. To do that you'd do something like
s.corr(pd.Series([4,5,6]), method="foobar") # type: ignore
so that the type checker was happy, but then pytest
will fail until you surround the code with pytest.raises
types of things.
Maybe that means we should have a pandas/tests/typing/valid
section - which contains valid code that is tested with pytest
and the type checker, and a pandas/tests/typing/invalid
section where we don't run pytest
. Then in the files in the invalid
directory, have something in there that just skips the tests.
I've ran into an challenge related to how pyright is configured in pandas: https://github.com/pandas-dev/pandas/blob/main/pyproject.toml#L150-L185 mainly because of those lines:
include = ["pandas", "typings"]
exclude = ["pandas/tests", "pandas/io/clipboard", "pandas/util/version"]
I've checked and the exclusion list is stronger than the inclusion list so it's not possible to make an exception to the exception.
Other than that I would have to enable most of the report
features below for the type tests to have any utility. This would also include all the type errors from the main codebase.
Due to this I'm thinking of having a separate configuration just for the type tests themselves. I'm not sure it that's even possible but I'll continue on this topic.
I've checked and the exclusion list is stronger than the inclusion list so it's not possible to make an exception to the exception.
Instead of exclude pandas/tests
you could enumerate all folders/files.
Other than that I would have to enable most of the report features below for the type tests to have any utility. This would also include all the type errors from the main codebase.
I think it should be sufficient to use strict = ["path/to/the/tests"]
this should overwrite all the disabled features. If not, another option is to have # pyright: strict
at the top of each test file.
Testing with mypy is already enough for a first version. You don't need to worry about pyright for now - unless you want to :)
edit:
If the folder is not excluded, strict = ["path/to/the/tests"]
is sufficient to enable all of pyrights rules for this folder (no need for a second config file/file-based comments). Unfortunately, I think we will need to exclude all sub-folder/files of pandas/tests.
@zkrolikowski-vl I crated the PR #45561 to demonstrate how the test snippets from VirtusLab could be integrated.
@twoertwein Opened my own PR based on yours, with the __setitem__
fix. I hope that the commit message is satisfactory attribution to all authors. https://github.com/pandas-dev/pandas/pull/45594
@zkrolikowski-vl When you say "with the __setitem__
fix, do you mean you updated the test you had to not include it?
@Dr-Irv In a sense yes. I've purposefully avoided the situation in which the regression happens. The test case itself still works.
@Dr-Irv I think you mentioned that the MS stub started using a not-yet supported argument of reveal_type
to assert the type. Mypy 0.950 now supports something similar: typing_extensions.assert_type
https://github.com/python/mypy/pull/12584 (and typing.assert_type
in future python versions).
@twoertwein Thanks. It appears that assert_type()
was also added to pyright
as well. This will clean up various tests in the MS stubs and the pandas-stubs
repo that I'm working on here: https://github.com/pandas-dev/pandas-stubs
It appears that
assert_type()
was also added topyright
as well. This will clean up various tests in the MS stubs and thepandas-stubs
repo that I'm working on here:
cool. this will overcome the deficiencies of the "assignment" method that I commented on in https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1009279373 and now align the testing with the numpy method (that was done in https://github.com/pandas-dev/pandas/pull/40202)?
@Dr-Irv can/should we move this issue (and maybe others) to https://github.com/pandas-dev/pandas-stubs or best to keep that to stub specific issues since I think we could also use the test framework to test the pandas code too. see my comment in mailing list thread.
I think we could also use the test framework to test the pandas code too.
I guess there are two ways to do this.
or both. if there is testing against the pandas codebase in pandas-stubs, 1. is also more likely to succeed if automated.
cool. this will overcome the deficiencies of the "assignment" method that I commented on in #45252 (comment) and now align the testing with the numpy method (that was done in #40202)?
I believe that to be the case.
@Dr-Irv can/should we move this issue (and maybe others) to https://github.com/pandas-dev/pandas-stubs or best to keep that to stub specific issues since I think we could also use the test framework to test the pandas code too. see my comment in mailing list thread.
I could see it going either way. From the pandas development standpoint, we would want to make sure that changes to pandas in a PR don't cause the tests in pandas-stubs
to fail. So keeping this issue here makes sense. On the other hand, from the pandas-stubs
viewpoint, we'd have to figure out how to manage testing the stubs that test against the released version of pandas, versus testing stubs that test against the main
branch of pandas, so having an issue there makes sense as well.
There are other issues here that are purely about public type stubs that could possibly be moved over (and maybe closed).
I really like the idea of testing pandas
with its own annotations and pandas-stubs
, but I'm honestly not sure whether pandas-stubs
can be used to check pandas
without having pandas-stubs
-specific ignore comments in pandas
(or some patch/sed magic).
I don't think this is a particularly good example, but it might demonstrate the potential issue: __getitem__
will probably never return all possible types in pandas-stubs
(#46616), I could imagine that (after __getitem__
is typed with unions in pandas
), that mypy will complain about unnecessary comparisons when using pandas-stubs
.
Instead of checking pandas
with pandas-stubs
, it might be an option to somehow (don't ask me how) enforce that pandas-stubs
only contains annotations where different annotations are intended (for example __getitem__
) and functions that are not annotated in pandas
. Functions that have suitable annotations in pandas
could simply(?) be imported from pandas
(if pandas
is a dependency of pandas-stubs
).
I really like the idea of testing
pandas
with its own annotations andpandas-stubs
, but I'm honestly not sure whetherpandas-stubs
can be used to checkpandas
without havingpandas-stubs
-specific ignore comments inpandas
(or some patch/sed magic).
I think we are not talking about using the pandas-stubs
to test pandas source. I think the idea is that we make sure that changes to pandas-stubs
work with both released versions of pandas, and in-development versions of pandas, and, conversely, that changes to pandas
in development don't break the stubs. The stubs are tested with mypy
, pyright
, but also pytest
, so you could imagine a change in pandas
breaking a pytest
test.
Instead of checking
pandas
withpandas-stubs
, it might be an option to somehow (don't ask me how) enforce thatpandas-stubs
only contains annotations where different annotations are intended (for example__getitem__
) and functions that are not annotated inpandas
. Functions that have suitable annotations inpandas
could simply(?) be imported frompandas
(ifpandas
is a dependency ofpandas-stubs
).
I don't think that you can have some stubs in a pandas-stubs
package, and the rest in the source code. Not sure the type checkers will manage that.
I don't think that you can have some stubs in a
pandas-stubs
package, and the rest in the source code. Not sure the type checkers will manage that.
If the stubs explicitly import them (and re-export them as public functions in __all__
or with as
) it should work. But having pandas
as a dependency of pandas-stubs
might not be great.
Just to be clear, my comment https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1114768645 regards the test framework only.
In the discussion about the challenges of making pandas stubs public (mailing list, dev meeting and elsewhere), my support for having a separate stubs package was on the basis that we would have a test framework that could help keep the standalone stubs package and the pandas codebase more consistent.
Other than the test framework, we should probably keep the standalone pandas-stubs and inline annotations in pandas decoupled. This will allow us to use all the latest typing features in pandas-stubs as they become available. It will also allow us to continue to add annotations to the pandas code without further constraints (it's quite a challenge already relative to the ease of creating stubs)
I suspect that we may need to comment out failing tests (on the pandas side) with a TODO in some cases or just allow the test job to fail if we copy over the tests, but I hope that we wouldn't need to change code on the pandas side or stubs on the pandas-stubs side to share the tests.
Other than the test framework, we should probably keep the standalone pandas-stubs and inline annotations in pandas decoupled. This will allow us to use all the latest typing features in pandas-stubs as they become available. It will also allow us to continue to add annotations to the pandas code without further constraints (it's quite a challenge already relative to the ease of creating stubs)
I think we are all in agreement on keeping the public stubs separate.
I suspect that we may need to comment out failing tests (on the pandas side) with a TODO in some cases or just allow the test job to fail if we copy over the tests, but I hope that we wouldn't need to change code on the pandas side or stubs on the pandas-stubs side to share the tests.
I'm not sure that the tests in pandas-stubs
would be good to use as typing tests against the pandas code base for the following reason. I intend to add assert_type
to those tests. For some methods/functions, the public stubs may return a more narrow type than the internal types declare. As an example, consider the discussion we had about whether pd.to_datetime()
should indicate that it can return NaTType
in all the overloads. For the public stubs, we dont' want to indicate that - it would make users do cast
all the time. But internally, we may want to indicate the full wider type. So a test in pandas-stubs
would do an assert_type
on the narrower type, but if you use that test on the pandas code base, that test would fail.
I'm not sure that the tests in
pandas-stubs
would be good to use as typing tests against the pandas code base for the following reason. I intend to addassert_type
to those tests. For some methods/functions, the public stubs may return a more narrow type than the internal types declare.
I think that is exactly the reason to use the same tests. IMO we should make the annotations as consistent as possible and part of the original argument on why we did not have separate stubs before now.
As an example, consider the discussion we had about whether
pd.to_datetime()
should indicate that it can returnNaTType
in all the overloads. For the public stubs, we dont' want to indicate that - it would make users docast
all the time.
I still disagree with this approach, but am happy to await user feedback once the stubs are public on whether this is a significant issue.
So a test in
pandas-stubs
would do anassert_type
on the narrower type, but if you use that test on the pandas code base, that test would fail.
I do expect that we would need to have some allowed failures at first, but fixing these would make the pandas-stubs and in-line type annotations more consistent.
@simonjayhawkins @twoertwein
I'm thinking we can close this, as we are now testing the type checking of the published pandas-stubs
API's as part of the CI in that project. Would like your opinion.
Could also leave this issue open if we hope to have consistency checks between pandas and pandas-stubs.
From discussion on pandas typing meeting on January 7, 2022 (notes at https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/)
Follow up to discussion here: https://github.com/pandas-dev/pandas/issues/28142#issuecomment-991946409
Goal is to have a set of tests that make sure that users of the pandas API writing correct code work with whatever type stubs we publish.
Example of an approach that is suggested was developed within the pandas-stubs project based on various code snippets which are then tested with
mypy
: https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets @zkrolikowski-vl and @joannasendorek from that project have offered to help (nudge, nudge)Ideally we would add this testing framework to the CI before starting to bring over the stubs.