unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.05k stars 281 forks source link

fix: properly coerce dtypes for columns with regex=True #1602

Closed tesslinden closed 2 weeks ago

tesslinden commented 3 weeks ago

Fixes #1182. I ran into this bug myself, then found it was previously reported.

See the included tests for a minimal example of the bug: test_config_coerce() passes on main; test_config_coerce_with_regex() fails on main, but passes with this fix.

The change I've submitted here is the minimal change necessary to fix the bug. With this fix, some code is duplicated between the regex and non-regex blocks of the _coerce_dtype_helper() function. I considered separating it into helper functions like _should_coerce() or _override_and_try_coercion(), but there are several ways one could split it up, so I figured reviewers can decide which of those would be preferred.

Also, I wasn't sure which file the tests should go in -- let me know if they should be moved.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.11%. Comparing base (4df61da) to head (2e247b3). Report is 81 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1602 +/- ## =========================================== - Coverage 94.29% 83.11% -11.18% =========================================== Files 91 116 +25 Lines 7024 8536 +1512 =========================================== + Hits 6623 7095 +472 - Misses 401 1441 +1040 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tesslinden commented 2 weeks ago

thanks @tesslinden 🚀 and congrats on your first PR to pandera 🎉

Awesome! Thank you!