Expand pa.Check built-in methods

vovavili commented 2 years ago

Discussed in https://github.com/pandera-dev/pandera/discussions/799

^{Originally posted by **vovavili** March 25, 2022} I think we should start with these methods and work our way up: 1) [Expect a specific format of datetime string in a given column](https://greatexpectations.io/expectations/expect_column_values_to_match_strftime_format) 2) [Expect all values in a column to be unique](https://greatexpectations.io/expectations/expect_column_values_to_be_unique); [also this](https://greatexpectations.io/expectations/expect_select_column_values_to_be_unique_within_record) 3) [An ability to operate specifically with column's min, max and average values](https://greatexpectations.io/expectations/expect_column_max_to_be_between) 4) [For a pair of columns, expect value in column n1 to be greater than value in column n2](https://greatexpectations.io/expectations/expect_column_pair_values_a_to_be_greater_than_b). 5) [Check pertaining to order of rows, i.e. expect column values to be decreasining/increasing](https://greatexpectations.io/expectations/expect_column_values_to_be_increasing) Thank you all in advance for your input, thoughts and effort.

wakelt commented 1 year ago

SchemaModel DataFrame check(s):

check_for_duplicate_records(key_columns=[])
I'll likely come up with more over the next few weeks as I continue to use the tool

If duplicate columns are found, they should be documented in the err.failure_cases

cosmicBboy commented 1 year ago

@wakelt FYI the DataFrameSchema (or SchemaModel.Config option) has a unique option that checks for duplicate records: https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#validating-the-joint-uniqueness-of-columns

unionai-oss / pandera

Expand pa.Check built-in methods #806

Discussed in https://github.com/pandera-dev/pandera/discussions/799