Open sam-goodwin opened 1 month ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 83.07%. Comparing base (
4df61da
) to head (8ab65a5
). Report is 75 commits behind head on main.:exclamation: Current head 8ab65a5 differs from pull request most recent head 9dc8ed5. Consider uploading reports for the commit 9dc8ed5 to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks @sam-goodwin, see https://pandera.readthedocs.io/en/latest/CONTRIBUTING.html#set-up-pre-commit for steps to make sure linters and unit tests are passing. You'll also need to sign your commits: https://pandera.readthedocs.io/en/latest/CONTRIBUTING.html#dco-signing-commits
Mypy errors:
tests/core/test_typing.py:498: error: "list" is not subscriptable, use "typing.List" instead [misc]
tests/core/test_typing.py:499: error: "dict" is not subscriptable, use "typing.Dict" instead [misc]
tests/core/test_typing.py:500: error: "tuple" is not subscriptable, use "typing.Tuple" instead [misc]
Note that pandera needs to support python 3.8 as well, so we need to use the generic types in the typing module.
Failing unit test:
FAILED tests/core/test_typing.py::test_complex_python_collection_types - pandera.errors.SchemaError: expected series 'list' to have type list[pandera.dtypes.Int32]:
failure cases:
index failure_case
0 0 [1, 2]
1 1 [3, 4, 5]
Looks like you need to use the built-in int
type? pandera.dtypes.Int32
translates to the numpy dtype for pandas columns.
Looks like you need to use the built-in int type? pandera.dtypes.Int32 translates to the numpy dtype for pandas columns.
Do you mean we can't specify ints with specific precision in a List or Dict in pandera?
Do you mean we can't specify ints with specific precision in a List or Dict in pandera?
This just follows the way pandas deals with data. Columns containing list
or dict
objects are just python objects, meaning they're not numpy arrays. This might be different for pyarrow data representations, but that'll be something to tackle when adding pyarrow support https://github.com/unionai-oss/pandera/issues/1262.
In summary, pandera.dtypes.Int32
maps onto a numpy.int32
, and a list[numpy.int32]
isn't meaningful in the context of pandas. list[int]
does tho, and will contain just lists of python ints.
@sam-goodwin friendly ping: one of the unit tests is still failing: https://github.com/unionai-oss/pandera/actions/runs/8861081819/job/24332580434?pr=1556
Closes https://github.com/unionai-oss/pandera/issues/1555