Closed anhollis closed 4 years ago
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
I am also really missing such a feature. Sadly the solution provided above is outdated. Is there any work-around solution with the current version (0.9.7). Are there plans to include such a group / filter functionality for expectations in the future?
Okay, the above solution works with from great_expectations.dataset.pandas_dataset.PandasDataset instead of ge.dataset.PandasDataTable.
Still, I don't know how to include this in a suite. I thought of creating a custom expectation, but either this one needs to be implemented for each expectation_type that should be group-able or it would require to wrap other expectations. IMHO the much better way would be an additional argument for all expectations of a backend which allows the expectation to work only on a subset of the batch, based on some filter defined in the argument.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?\n\nThis issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Below is a potential solution to the recurring problem of mutli-column expectation and performing expectations on grouped data. In this solution, users specify some object that can be used to group the data. This object could be a column, a set of keys, etc. The user would also specify the data set on which to build the expectations, and the expectations they want to run, along with a dictionary of those expectation arguments. The return value is a dictionary that contains an entry for every group. These entries are themselves dictionaries that contain expectation results for each expectation that was run.
I think this is different from the pre-processing approach (#294) in that it would allow the user to specify multiple group expectations to examine simultaneously. Some other related issues are(#351, #373, #236).
It is possible that this kind of function is trying to solve too specific of a problem. We might prefer to have a function that addresses something more general than simply grouping data and running expectations. We might want to avoid specific utility functions altogether, but I still thought it might be worth considering.
Below is a full reproducible example of how this solution would work for a pandas data set. The concept can be easily extended to sql, xlm, and json data; this would only require a change in the way that data is grouped, but the process would be similar in each case. We would likely need to write a separate group_by function for each type of data set we would want to consider.