Closed upretip closed 5 years ago
I've been looking at this closely and discovered a handful of un-handled corner cases related to NaN
values. Until I get this sorted, NaN
values will have to be handled using a workaround—e.g., using the fillna()
method to replace them with a proxy value.
As a stopgap, you could do the following:
NAN = object()
# Include NAN in the validation set.
data = df['A'].fillna(NAN)
validate.superset(data, {'x', 'y', 'z', NAN})
# Accept NAN as a difference.
data = df['A'].fillna(NAN)
with accepted(Invalid(NAN)):
validate(data, str)
Going forward, I will file a related issue/bug for this with the goal of allowing the use of NaN values directly:
# Include NaN in the validation set.
validate.superset(data, {'x', 'y', 'z', np.nan})
# Accept NAN as a difference.
with accepted(Invalid(np.nan)):
validate(data, str)
I'll post a follow-up to this issue once I have patched this behavior.
Thanks. I will follow this.
This is done: ce71b345: Update predicate handling to better support NaN values. bee6aa84: Add NaN handling idioms to test_usecases.py. 32d3bb93: Add test_numbers_equal() to verify numeric comparison. e8435b15: Update difference behavior to support tuples containing NaNs. c962e04c: Change RequiredInterval to fail if arguments are NaN. c78f390c: Fix RequiredInterval to properly handle NaN differences. 4995510d: Update NaN use cases to highlight recommended pattern. fa2646ef: Add how-to documentation for working with NaN values.
@upretip, I've just pushed some new "how to" docs that give detail regarding NaN validation and behavior. You can view it in the latest docs here:
How to Deal With NaN Values https://datatest.readthedocs.io/en/latest/how-to/nan-values.html
Thanks for the help. Closing this issue now!
Shaun, I am trying your package to see if I can validate a csv file by reading it in pandas. I am getting Extra(nan)
dt.validate.superset()
or Invalid(nan)dt.validate()
. Is there a way I can include thosenan
in my validation sets?Error looks like
Note: I am reading this particular column as
str
Let me know if you find a solution or can help me debug