Closed diegoquintanav closed 3 years ago
I'd be interested in something like this as well. In general it would be nice to have validation across columns. Not sure what's the best way though to generalize the current schema which is centered on independent columns.
@TMiguelT have you got any suggestions?
Hmm. This seems like a useful validation to have. I'll have to think about how to handle DataFrame-level validations in terms of the interface
Another one we are using is something like "if col a has value x then col b needs to have value in list c", so you would need to some sort of constraint that works on the data frame itself. Something like SeriesValidation
but which accepts a DataFrame
in validate
.
Good point. There's probably a need for a generalised DataFrame-level validation
(off-topic) @TMiguelT are you expecting contributions? Perhaps a gitter chat?
I'm happy to have contributions for this or any other feature requests. I've commented on your other PR
I am interested in this enhancement too. In my case I would be using it to check if a total count column is in fact equal to the total of several category count columns. Thanks!
Closing in favour of the more general #57 that I just opened.
Hi there.
Consider the following
schema
with fake column namesthis works on top of the
series.duplicated
method of pandas.Consider that there is also a method for Dataframes, is it possible to establish composite columns so
IsDistinctValidation()
checks for combinations also? kind of an additional parameter**columns
as a list of columns defined inside the sameschema
passed toisDistinctValidation()
.What I do now is to insert a new temporary column as a tuple out of the elements I want to check i.e.
and then in the schema add the column as
BTW nice job and thanks!