ENH: Add set of tests to type check the public APIs

Dr-Irv commented 2 years ago

From discussion on pandas typing meeting on January 7, 2022 (notes at https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/)

Follow up to discussion here: https://github.com/pandas-dev/pandas/issues/28142#issuecomment-991946409

Goal is to have a set of tests that make sure that users of the pandas API writing correct code work with whatever type stubs we publish.

Example of an approach that is suggested was developed within the pandas-stubs project based on various code snippets which are then tested with mypy : https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets @zkrolikowski-vl and @joannasendorek from that project have offered to help (nudge, nudge)

Ideally we would add this testing framework to the CI before starting to bring over the stubs.

simonjayhawkins commented 2 years ago

Example of an approach that is suggested was developed within the pandas-stubs project based on various code snippets which are then tested with mypy : https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets

see #40202 for a less verbose but more primitive approach.

Using pytest is perhaps more robust. How easy is is to parametrize testing of types? I assume that parametrization cannot be used for static checks.

The method in #40202 has several issues. If anyone knows of other approaches to testing types, would much appreciate posting suggestions here.

simonjayhawkins commented 2 years ago

from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-795403039

we must also remember that there is also the "do nothing" option and not explicitly test the types. IIRC this was suggested in the January dev meeting (Mostly relying on user feedback to report issues.)

repeating here so that this option is not overlooked if the solution gets too complex.

twoertwein commented 2 years ago

One potentially easier to implement option is to have something like mypy_primer but for pandas. Typeshed uses mypy_primer to automatically comment on each PR whether it would change type annotations as inferred by mypy.

"Pandas_primer" would help to 1) prevent involuntary changes to public type annotations and to 2) verify that intended changes actually have an effect.

zkrolikowski-vl commented 2 years ago

In pandas-stubs, we've decided to go with the simplest method, which doesn't require any extra setup or testing constructs. Here are some of the other tools we've looked at (not including dynamic type-checkers):

pytest-mypy-plugins - I wasn't able to make it work with stub files - it seemed to ignore them.
mypy.test.testcheck - built with testing mypy itself specifically. For a while I've tried to make it accept a different project. Looked promising (see example here ) but no luck.
pytest-mypy-testing Worked somewhat like intended but it reported errors from inside the testing library itself. Very similar to https://github.com/pandas-dev/pandas/pull/40202 by @simonjayhawkins.

Coming back to our solution, there's a way to test negative scenarios in our approach but it's far from elegant:

    rdf: pd.Series = pd.concat({1: df, 2: df2})     # type: ignore

This will cause mypy not to return errors when the line in-fact is incorrect,

whereas using it incorrectly

    rdf: pd.DataFrame = pd.concat({1: df, 2: df2})     # type: ignore

will result in an error

tests/snippets/test_pandas.py:55: error: Unused "type: ignore" comment

The issue that remains is that our whole approach is static and I don't see it being easily integrated with parametric list of use-cases.

mypy_primer seems like an another interesting approach. Potentially difficult to integrated into pandas CI but might be worth the extra effort.

Overall it seems like either pytest-mypy-testing/ @simonjayhawkins 's (with some work) or our (pandas-stubs) solution might be a semi-immediate fit for pandas use-case.

@simonjayhawkins if I may ask: Why was your work discontinued? Was it just inconvenient at the time?

Dr-Irv commented 2 years ago

Here is why I like the approach taken by the pandas-stubs project: https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets

We need a way to test the typing stubs we publish.
When they are published, it is like Goldilocks and the Three Bears. A particular stub could be too narrow, too wide, or just right.
The community will discover that certain stubs are too narrow or too wide.
As a stub is fixed, you can add tests for ones that were too narrow by adding test code that should pass a mypy test
As a stub is fixed, you can add tests for ones that were too wide by adding test code with #type: ignore comments and use the mypy warn_unused_ignores option to make sure that the ignore line is correct (and tested with future changes to the stubs)

From my experience in submitting a bunch of PR's to the Microsoft python-type-stubs project, which has no testing for the pandas stubs, I can tell you that I am always concerned that a fix I submit for the stubs will break something else. If they had included something like what the pandas-stubs project did, it would make life a lot easier to do more test-driven development of the stubs. This is especially true when you make a change in pandas/_typing.py that changes one of the top-level types. It's hard to know if you have then made some downstream stub too narrow or too wide.

joannasendorek commented 2 years ago

The approach we took in pandas-stubs was the most pragmatical for our usage and capacity we have to maintain the project. We can assume that any stubs will not be perfect and when people will first start using them, they will get some errors (I applied our stubs for one of the thousand lines project and made 10 fixes to pandas-stubs based on that only). So whenever this happens, we would first re-create the false negative case provided, modify the stubs and make sure the exact usage is corrected. From this moment, this particular usage is guarded so people using the project shouldn't notice regression and we can add only small improvements at once.

simonjayhawkins commented 2 years ago

@simonjayhawkins if I may ask: Why was your work discontinued? Was it just inconvenient at the time?

abandoned due to https://github.com/pandas-dev/pandas/pull/40202#issuecomment-798695425. In pandas we tend not merge stuff if there are any objections.

I assumed from that position and the notes from the meeting...

Pandas-stubs can reorganize their test files to correspond better to current pandas testing

that the pandas-stubs would be integrated into out pytest tests, but I guess I got that wrong from not attending the meeting.

@Dr-Irv I assume from https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1009006773 that your objections to previous attempt no longer apply?

Dr-Irv commented 2 years ago

that the pandas-stubs would be integrated into out pytest tests, but I guess I got that wrong from not attending the meeting.

What we decided is that the testing approach from pandas-stubs would be integrated, but we would use the Microsoft stubs as a starting point, because they are more complete than the pandas-stubs .

@Dr-Irv I assume from #45252 (comment) that your objections to previous attempt no longer apply?

I'm not sure which objections you are referring to....

What I said there is that if we could annotate all of our existing tests (which means adding #type: ignore statements in lots of places), that would be a way to test the stubs. But I think that would take a lot of effort.

Since then, I've seen the pandas-stubs approach and I think it is easier to start from there, and expand the methodology in there to cover many use cases of any stubs we decide to publish.

simonjayhawkins commented 2 years ago

I'm not sure which objections you are referring to....

specifically from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-798695425

I think it will be easier for the community to fix our source code that has typed methods for the public API (and add an appropriate test in pandas/tests) as opposed to using the reveal_type method proposed in this PR, which would require us to enumerate all the possibilities in two places - once in our regular testing code and once in the "testing types" code.

Since then, I've seen the pandas-stubs approach and I think it is easier to start from there, and expand the methodology in there to cover many use cases of any stubs we decide to publish.

IIUC the panda-stubs approach, if not using pytest, is not so different from #40202. i.e. we need to enumerate all the possibilities and have a separate set of tests.

also from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-798695425

In addition, it's not clear to me how the current proposal helps us test the variety of arguments that can be passed to methods. We know that our existing testing code already tries different argument types, so if we typed our testing code (and ran mypy on it), then we're getting the benefit of one set of tests that tests functionality as well as typing signatures.

The pandas-stubs testing approach, if not integrated into the current testing, also does not satisfy this.

Dr-Irv commented 2 years ago

I think it will be easier for the community to fix our source code that has typed methods for the public API (and add an appropriate test in pandas/tests) as opposed to using the reveal_type method proposed in this PR, which would require us to enumerate all the possibilities in two places - once in our regular testing code and once in the "testing types" code.

I was more referring to using the reveal_type() aspect of the testing.

IIUC the panda-stubs approach, if not using pytest, is not so different from #40202. i.e. we need to enumerate all the possibilities and have a separate set of tests.

I don't think we have to enumerate all the possibilities. We can start with a set of tests that represent "common" usage, and if the stubs are not covering some particular use case (by being too wide or too narrow), we can just add to those stubs.

In addition, it's not clear to me how the current proposal helps us test the variety of arguments that can be passed to methods. We know that our existing testing code already tries different argument types, so if we typed our testing code (and ran mypy on it), then we're getting the benefit of one set of tests that tests functionality as well as typing signatures.

The pandas-stubs testing approach, if not integrated into the current testing, also does not satisfy this.

That's correct. But it's a first step forward. I would like it if we could put typing into our testing code, and just run mypy on that, but it's a much larger effort that just starting with the pandas-stubs testing approach.

simonjayhawkins commented 2 years ago

just starting with the pandas-stubs testing approach.

+1

At the end of the day it's not so different than the previous attempt. We would at least have some testing and the tests in place (esp for the overloads).

changing those tests at a later date from say

i1: pd.Interval = pd.Interval(1, 2, closed='both')

to

reveal_type(pd.Interval(1, 2, closed='both')) # E: Interval

would not be difficult and would improve the robustness of the checks.

Dr-Irv commented 2 years ago

changing those tests at a later date from say
i1: pd.Interval = pd.Interval(1, 2, closed='both')
to
reveal_type(pd.Interval(1, 2, closed='both')) # E: Interval
would not be difficult and would improve the robustness of the checks.

If we were to go that route, since the notation used with reveal_type() here is something we are borrowing from numpy, we'd have to document it, etc., and it may make it more difficult for the community to make contributions.

simonjayhawkins commented 2 years ago

If we were to go that route, since the notation used with reveal_type() here is something we are borrowing from numpy, we'd have to document it, etc., and it may make it more difficult for the community to make contributions.

sure.

changing those tests at a later date ....

potential future possibility, not suggesting that we would do it, or that the discussion blocks the addition of static tests.

If the (i'll call it) assignment method works sufficiently well then may not need the additional complexity.

In https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1007988416, I said..

The method in #40202 has several issues.

We are using reveal_type so that we can check the return types explicitly. However, the problem is that, in the script we are using

if reveal not in expected_reveal

so IIRC there are cases which pass that shouldn't, since we are not checking equality to ensure an explicit check is made.

The "assignment method" is checking type compatibility only, so I imagine there could potentially be more cases that are not checked explicitly.

For instance, if we have var: typeA | typeB = func() and func changes from returning a union of the types to a single type of either typeA or typeB, then the tests would not catch this?

If we find that this less explicit checking is not an issue, then there would be no reason to make the testing more complex.

Dr-Irv commented 2 years ago

For instance, if we have var: typeA | typeB = func() and func changes from returning a union of the types to a single type of either typeA or typeB, then the tests would not catch this?

You could catch it this way:

var: typeA | typeB = func()
varA: typeA = func() # type: ignore
varB: typeB = func() # type: ignore

If we use the warn_unused_ignores argument of mypy, then the above code would pass. If the return type of func is narrowed to typeA without changing the tests, then the second line would generate the warning, indicating that the test needs to change.

If we find that this less explicit checking is not an issue, then there would be no reason to make the testing more complex.

I think this is just an issue of how complete we want the tests to be. If we try to make them fully complete, we'll never get done. So I say we start with a set of tests that include common usage of pandas, and expand those tests as issues get identified.

simonjayhawkins commented 2 years ago

I think this is just an issue of how complete we want the tests to be. If we try to make them fully complete, we'll never get done. So I say we start with a set of tests that include common usage of pandas, and expand those tests as issues get identified.

exactly. The very reason #40202 was opened as an alternative to typing the tests in the first place!

simonjayhawkins commented 2 years ago

from https://github.com/pandas-dev/pandas/pull/40202#issuecomment-790452266

I don't think we need to mirror the source code structure for this as we are unlikely to test typing for internals. (although we could have an internals section if there are some methods that require complex overloading that would benefit from testing)

The structure (and scope) either needs more discussion if it is a concern, or be allowed to evolve over time.

If we try to make them fully complete, we'll never get done.

so could maybe skip the

Pandas-stubs can reorganize their test files to correspond better to current pandas testing

step mentioned in the meeting discussion and get these in sooner by copying across what they already have and iterate on that?

Dr-Irv commented 2 years ago

so could maybe skip the

Pandas-stubs can reorganize their test files to correspond better to current pandas testing

step mentioned in the meeting discussion and get these in sooner by copying across what they already have and iterate on that?

I'm pretty sure this was a suggestion from @jreback . Currently, there are 6 files there, and the question is whether it's worth the effort to split them up.

simonjayhawkins commented 2 years ago

I think a file (x3) for each pandas public class (Series, DataFrame, Timestamp, Interval etc) and then a file for the top level functions (maybe split that one) and then somewhere for some internals (unit) testing, maybe a file or subdirectory.

Dr-Irv commented 2 years ago

I think a file (x3) for each pandas public class (Series, DataFrame, Timestamp, Interval etc) and then a file for the top level functions (maybe split that one) and then somewhere for some internals (unit) testing, maybe a file or subdirectory.

Can you explain the "(x3)" in the above? Are you saying there would be files each for Series, DataFrame, Timestamp, etc.? If so, what would be different between the 3 files?

simonjayhawkins commented 2 years ago

The x3 comes from pass/fail/reveal. Forget that for now to simplify the discussion.

So lets say one file per class, with 1 test function per method (or maybe 2 for pass/fail?)

There are some classes with so many methods that I think splitting the methods maybe noisy?

simonjayhawkins commented 2 years ago

basically, what pandas-stubs already has lgtm. It's close enough to the suggestion above and we can evolve the structure/style once copied across.

simonjayhawkins commented 2 years ago

@zkrolikowski-vl I prefer the structure for the tests that you already have so IMO https://github.com/VirtusLab/pandas-stubs/pull/127 is not necessary.

I think that keeping the structure more aligned to the public API is preferable (maybe easier for casual contributors) to mirroring either the source code or the pytest testing structure, but I defer to others on this since this was discussed in the meeting.

zkrolikowski-vl commented 2 years ago

@simonjayhawkins I can see that both you and @Dr-Irv are agreeing to leave the structure as is. It's definitely easier to read right now. My only concern is that as more and more tests will be added the files would either become very long or would have to be split with semi-subjective names.

I'll close https://github.com/VirtusLab/pandas-stubs/pull/127 for now.

Do we have an idea on how to integrate the tests with the CI? Looks like pandas is using both CircleCI and Github actions.

Dr-Irv commented 2 years ago

Do we have an idea on how to integrate the tests with the CI? Looks like pandas is using both CircleCI and Github actions.

My suggestion is as follows. Put your tests in pandas/tests/typing . Then I think the current CI will automatically pick up running mypy on your test files. If you think we should use different mypy options, then add a [[tool.mypy.overrides]] section to pyproject.yml and that allows you to override the default options for pandas.tests.typing .

There is also a section at the bottom for pyright testing, which doesn't go through pandas/tests, but you could probably create an overrides section there as well that forces pandas.tests.typing to get checked.

simonjayhawkins commented 2 years ago

Put your tests in pandas/tests/typing

will pytest execute these? The reason I ask is I don't think we want that?

The static tests in https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets have functions named test_* and filenames named test_*, so putting them there means pytest will pick these up?

zkrolikowski-vl commented 2 years ago

@Dr-Irv Will do.

@simonjayhawkins Yes. AFAIK pytest will pick them up as well. That's the way we've been testing if the snippets execute correctly.

twoertwein commented 2 years ago

There is also a section at the bottom for pyright testing, which doesn't go through pandas/tests, but you could probably create an overrides section there as well that forces pandas.tests.typing to get checked.

We have very few options of pyright enabled at the moment (most notably reportGeneralTypeIssues is disabled). If you want to run pyright (starting with only mypy is already a good start), the easiest is probably to add strict = ["pandas/tests/typing"] to pyproject.toml and appending the path also to include.

Dr-Irv commented 2 years ago

@simonjayhawkins Yes. AFAIK pytest will pick them up as well. That's the way we've been testing if the snippets execute correctly.

So the only issue here is that we will probably add tests for typing to test if a type is too wide, and put a #type: ignore comment on them, but the code will then execute with pytest, and raise an exception, which means if pytest is run on it, we have to expect that exception. For example:

>>> s = pd.Series([1,2,3])
>>> s.corr(pd.Series([4,5,6]), method="foobar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda3\envs\b38np122try2\lib\site-packages\pandas\core\series.py", line 2512, in corr
    raise ValueError(
ValueError: method must be either 'pearson', 'spearman', 'kendall', or a callable, 'foobar' was supplied

So when we type Series.corr() to have method be a Union[Literal["pearson", "spearman", "kendall"], Callable], then you want the typing test to test that foobar is invalid for method. To do that you'd do something like

 s.corr(pd.Series([4,5,6]), method="foobar") # type: ignore

so that the type checker was happy, but then pytest will fail until you surround the code with pytest.raises types of things.

Maybe that means we should have a pandas/tests/typing/valid section - which contains valid code that is tested with pytest and the type checker, and a pandas/tests/typing/invalid section where we don't run pytest. Then in the files in the invalid directory, have something in there that just skips the tests.

zkrolikowski-vl commented 2 years ago

I've ran into an challenge related to how pyright is configured in pandas: https://github.com/pandas-dev/pandas/blob/main/pyproject.toml#L150-L185 mainly because of those lines:

include = ["pandas", "typings"]
exclude = ["pandas/tests", "pandas/io/clipboard", "pandas/util/version"]

I've checked and the exclusion list is stronger than the inclusion list so it's not possible to make an exception to the exception.

Other than that I would have to enable most of the report features below for the type tests to have any utility. This would also include all the type errors from the main codebase.

Due to this I'm thinking of having a separate configuration just for the type tests themselves. I'm not sure it that's even possible but I'll continue on this topic.

twoertwein commented 2 years ago

I've checked and the exclusion list is stronger than the inclusion list so it's not possible to make an exception to the exception.

Instead of exclude pandas/tests you could enumerate all folders/files.

Other than that I would have to enable most of the report features below for the type tests to have any utility. This would also include all the type errors from the main codebase.

I think it should be sufficient to use strict = ["path/to/the/tests"] this should overwrite all the disabled features. If not, another option is to have # pyright: strict at the top of each test file.

Testing with mypy is already enough for a first version. You don't need to worry about pyright for now - unless you want to :)

edit: If the folder is not excluded, strict = ["path/to/the/tests"] is sufficient to enable all of pyrights rules for this folder (no need for a second config file/file-based comments). Unfortunately, I think we will need to exclude all sub-folder/files of pandas/tests.

twoertwein commented 2 years ago

@zkrolikowski-vl I crated the PR #45561 to demonstrate how the test snippets from VirtusLab could be integrated.

zkrolikowski-vl commented 2 years ago

@twoertwein Opened my own PR based on yours, with the __setitem__ fix. I hope that the commit message is satisfactory attribution to all authors. https://github.com/pandas-dev/pandas/pull/45594

Dr-Irv commented 2 years ago

@zkrolikowski-vl When you say "with the __setitem__ fix, do you mean you updated the test you had to not include it?

zkrolikowski-vl commented 2 years ago

@Dr-Irv In a sense yes. I've purposefully avoided the situation in which the regression happens. The test case itself still works.

twoertwein commented 2 years ago

@Dr-Irv I think you mentioned that the MS stub started using a not-yet supported argument of reveal_type to assert the type. Mypy 0.950 now supports something similar: typing_extensions.assert_type https://github.com/python/mypy/pull/12584 (and typing.assert_type in future python versions).

Dr-Irv commented 2 years ago

@twoertwein Thanks. It appears that assert_type() was also added to pyright as well. This will clean up various tests in the MS stubs and the pandas-stubs repo that I'm working on here: https://github.com/pandas-dev/pandas-stubs

simonjayhawkins commented 2 years ago

It appears that assert_type() was also added to pyright as well. This will clean up various tests in the MS stubs and the pandas-stubs repo that I'm working on here:

cool. this will overcome the deficiencies of the "assignment" method that I commented on in https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1009279373 and now align the testing with the numpy method (that was done in https://github.com/pandas-dev/pandas/pull/40202)?

simonjayhawkins commented 2 years ago

@Dr-Irv can/should we move this issue (and maybe others) to https://github.com/pandas-dev/pandas-stubs or best to keep that to stub specific issues since I think we could also use the test framework to test the pandas code too. see my comment in mailing list thread.

simonjayhawkins commented 2 years ago

I think we could also use the test framework to test the pandas code too.

I guess there are two ways to do this.

a script to copy over the tests from pandas-stubs, this could be a github workflow on a commit to pandas-stubs that touches the tests, scheduled, nightly, weekly etc or on a pandas-stubs release.
ci on pandas stubs that tests against pandas main (and maybe the stable release)

or both. if there is testing against the pandas codebase in pandas-stubs, 1. is also more likely to succeed if automated.

Dr-Irv commented 2 years ago

cool. this will overcome the deficiencies of the "assignment" method that I commented on in #45252 (comment) and now align the testing with the numpy method (that was done in #40202)?

I believe that to be the case.

Dr-Irv commented 2 years ago

@Dr-Irv can/should we move this issue (and maybe others) to https://github.com/pandas-dev/pandas-stubs or best to keep that to stub specific issues since I think we could also use the test framework to test the pandas code too. see my comment in mailing list thread.

I could see it going either way. From the pandas development standpoint, we would want to make sure that changes to pandas in a PR don't cause the tests in pandas-stubs to fail. So keeping this issue here makes sense. On the other hand, from the pandas-stubs viewpoint, we'd have to figure out how to manage testing the stubs that test against the released version of pandas, versus testing stubs that test against the main branch of pandas, so having an issue there makes sense as well.

There are other issues here that are purely about public type stubs that could possibly be moved over (and maybe closed).

twoertwein commented 2 years ago

I really like the idea of testing pandas with its own annotations and pandas-stubs, but I'm honestly not sure whether pandas-stubs can be used to check pandas without having pandas-stubs-specific ignore comments in pandas (or some patch/sed magic).

I don't think this is a particularly good example, but it might demonstrate the potential issue: __getitem__ will probably never return all possible types in pandas-stubs (#46616), I could imagine that (after __getitem__ is typed with unions in pandas), that mypy will complain about unnecessary comparisons when using pandas-stubs.

Instead of checking pandas with pandas-stubs, it might be an option to somehow (don't ask me how) enforce that pandas-stubs only contains annotations where different annotations are intended (for example __getitem__) and functions that are not annotated in pandas. Functions that have suitable annotations in pandas could simply(?) be imported from pandas (if pandas is a dependency of pandas-stubs).

Dr-Irv commented 2 years ago

I really like the idea of testing pandas with its own annotations and pandas-stubs, but I'm honestly not sure whether pandas-stubs can be used to check pandas without having pandas-stubs-specific ignore comments in pandas (or some patch/sed magic).

I think we are not talking about using the pandas-stubs to test pandas source. I think the idea is that we make sure that changes to pandas-stubs work with both released versions of pandas, and in-development versions of pandas, and, conversely, that changes to pandas in development don't break the stubs. The stubs are tested with mypy, pyright, but also pytest, so you could imagine a change in pandas breaking a pytest test.

Instead of checking pandas with pandas-stubs, it might be an option to somehow (don't ask me how) enforce that pandas-stubs only contains annotations where different annotations are intended (for example __getitem__) and functions that are not annotated in pandas. Functions that have suitable annotations in pandas could simply(?) be imported from pandas (if pandas is a dependency of pandas-stubs).

I don't think that you can have some stubs in a pandas-stubs package, and the rest in the source code. Not sure the type checkers will manage that.

twoertwein commented 2 years ago

I don't think that you can have some stubs in a pandas-stubs package, and the rest in the source code. Not sure the type checkers will manage that.

If the stubs explicitly import them (and re-export them as public functions in __all__ or with as) it should work. But having pandas as a dependency of pandas-stubs might not be great.

simonjayhawkins commented 2 years ago

Just to be clear, my comment https://github.com/pandas-dev/pandas/issues/45252#issuecomment-1114768645 regards the test framework only.

In the discussion about the challenges of making pandas stubs public (mailing list, dev meeting and elsewhere), my support for having a separate stubs package was on the basis that we would have a test framework that could help keep the standalone stubs package and the pandas codebase more consistent.

Other than the test framework, we should probably keep the standalone pandas-stubs and inline annotations in pandas decoupled. This will allow us to use all the latest typing features in pandas-stubs as they become available. It will also allow us to continue to add annotations to the pandas code without further constraints (it's quite a challenge already relative to the ease of creating stubs)

I suspect that we may need to comment out failing tests (on the pandas side) with a TODO in some cases or just allow the test job to fail if we copy over the tests, but I hope that we wouldn't need to change code on the pandas side or stubs on the pandas-stubs side to share the tests.

Dr-Irv commented 2 years ago

Other than the test framework, we should probably keep the standalone pandas-stubs and inline annotations in pandas decoupled. This will allow us to use all the latest typing features in pandas-stubs as they become available. It will also allow us to continue to add annotations to the pandas code without further constraints (it's quite a challenge already relative to the ease of creating stubs)

I think we are all in agreement on keeping the public stubs separate.

I suspect that we may need to comment out failing tests (on the pandas side) with a TODO in some cases or just allow the test job to fail if we copy over the tests, but I hope that we wouldn't need to change code on the pandas side or stubs on the pandas-stubs side to share the tests.

I'm not sure that the tests in pandas-stubs would be good to use as typing tests against the pandas code base for the following reason. I intend to add assert_type to those tests. For some methods/functions, the public stubs may return a more narrow type than the internal types declare. As an example, consider the discussion we had about whether pd.to_datetime() should indicate that it can return NaTType in all the overloads. For the public stubs, we dont' want to indicate that - it would make users do cast all the time. But internally, we may want to indicate the full wider type. So a test in pandas-stubs would do an assert_type on the narrower type, but if you use that test on the pandas code base, that test would fail.

simonjayhawkins commented 2 years ago

I'm not sure that the tests in pandas-stubs would be good to use as typing tests against the pandas code base for the following reason. I intend to add assert_type to those tests. For some methods/functions, the public stubs may return a more narrow type than the internal types declare.

I think that is exactly the reason to use the same tests. IMO we should make the annotations as consistent as possible and part of the original argument on why we did not have separate stubs before now.

As an example, consider the discussion we had about whether pd.to_datetime() should indicate that it can return NaTType in all the overloads. For the public stubs, we dont' want to indicate that - it would make users do cast all the time.

I still disagree with this approach, but am happy to await user feedback once the stubs are public on whether this is a significant issue.

So a test in pandas-stubs would do an assert_type on the narrower type, but if you use that test on the pandas code base, that test would fail.

I do expect that we would need to have some allowed failures at first, but fixing these would make the pandas-stubs and in-line type annotations more consistent.

Dr-Irv commented 1 year ago

@simonjayhawkins @twoertwein

I'm thinking we can close this, as we are now testing the type checking of the published pandas-stubs API's as part of the CI in that project. Would like your opinion.

twoertwein commented 1 year ago

Could also leave this issue open if we hope to have consistency checks between pandas and pandas-stubs.

pandas-dev / pandas

ENH: Add set of tests to type check the public APIs #45252