unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.37k stars 310 forks source link

Expand pa.Check built-in methods #806

Open vovavili opened 2 years ago

vovavili commented 2 years ago

Discussed in https://github.com/pandera-dev/pandera/discussions/799

Originally posted by **vovavili** March 25, 2022 I think we should start with these methods and work our way up: 1) [Expect a specific format of datetime string in a given column](https://greatexpectations.io/expectations/expect_column_values_to_match_strftime_format) 2) [Expect all values in a column to be unique](https://greatexpectations.io/expectations/expect_column_values_to_be_unique); [also this](https://greatexpectations.io/expectations/expect_select_column_values_to_be_unique_within_record) 3) [An ability to operate specifically with column's min, max and average values](https://greatexpectations.io/expectations/expect_column_max_to_be_between) 4) [For a pair of columns, expect value in column n1 to be greater than value in column n2](https://greatexpectations.io/expectations/expect_column_pair_values_a_to_be_greater_than_b). 5) [Check pertaining to order of rows, i.e. expect column values to be decreasining/increasing](https://greatexpectations.io/expectations/expect_column_values_to_be_increasing) Thank you all in advance for your input, thoughts and effort.
wakelt commented 1 year ago

SchemaModel DataFrame check(s):

If duplicate columns are found, they should be documented in the err.failure_cases

cosmicBboy commented 1 year ago

@wakelt FYI the DataFrameSchema (or SchemaModel.Config option) has a unique option that checks for duplicate records: https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#validating-the-joint-uniqueness-of-columns