Open colindean opened 4 years ago
@colindean @samratmitra-0812 I think handling as an exception in ColumnMaxCheck
is more appropriate. In both cases it is an edge that isn't covered, rather than a logical failure of the test. What do you think?
I can see it either way. Think of the failure mode when DV encounters an unsupported column type:
DV expects the user to inspect failures in the report at the end of a run. Should DV expect the user to know the type of the data in a column, especially when DV's config cannot necessarily reflect that type? That is, YAML only knows JSON numeric types, which are integer and double (I think they're called "fractions" technically in JSON) and exponential.
Also, is it more user-friendly to allow a validation run to complete or should it die immediately at runtime?
Immediate exit for normally long-running processes should be focused at the start of the run. If there's a way we can detect supported types in configCheck
, we should do that if DV will exit ungracefully when it encounters an unhandled case.
One of the things I liked about my expressions approach to columnSumCheck was pushing the type handling to the Spark/Hive level, so we didn't need to maintain a list of supported data types that generally ends up being an exhaustive list of subtypes of NumericType
without a way to ensure that we've handled all cases exhaustively.
Immediate exit for normally long-running processes should be focused at the start of the run. If there's a way we can detect supported types in configCheck, we should do that if DV will exit ungracefully when it encounters an unhandled case.
@colindean This is a good point. Maybe too general of a statement, but at level of config check we can throw exceptions, it will be a fast failure to your point. And at deeper levels (like quick check) we just log the error so that other checks still have the chance to complete.
@colindean @phpisciuneri Datatype verification in configCheck
is being done in NegativeCheck
and StringRegexCheck
. But even there, it is being treated as a normal check failure.
@samratmitra-0812 pointed out: