Fix Schema Validation when using DtypeValidator

multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas

https://multimeric.github.io/PandasSchema/

GNU General Public License v3.0

190 stars 35 forks source link

Fix Schema Validation when using DtypeValidator #18

Closed farridav closed 5 years ago

farridav commented 6 years ago

Fixes dtype validation (TypeError: '<' not supported between instances of 'NoneType' and 'NoneType')
Fixes existing failing test for validation output
Adds column name to dtype validation errors if column is passed in
Some automatic whitepace cleanup by my editor

melvintl commented 6 years ago

+1 Would fix the issue was encountering

multimeric commented 6 years ago

Fixes dtype validation

Can you write a test for this point? I assume that's what the e.row or 0 change was for, but it doesn't look like that test is for that error

Fixes existing failing test for validation output

Which test was this?

Also I note that you've put your test in test_schema.py instead of test_validation.py. Can you explain why it's a schema-wide test?

farridav commented 6 years ago

@TMiguelT the reason it is a schema unit test is because the error I found was raised in schema.py, and I've made changes to (and assertions for) the output of validation.py

Perhaps I've labeled the pull request wrong, really its fixing schema validation when using a validator who's ValidationError raises do not contain a column, and enriching the respective error messages, I can update PR title to reflect this.

multimeric commented 6 years ago

As I said above, can you write a test demonstrating the previous error? It would be helpful for me to understand this PR, and to have a test to stop it happening again. You seem to have fixed a bug, but not explained what it is, when it happens, or written tests for it.

the reason it is a schema unit test is because the error I found was raised in schema.py, and I've made changes to (and assertions for) the output of validation.py

Since you changed the validation message for a specific validator, I think this should go here, in test_validation.py: https://github.com/TMiguelT/PandasSchema/blob/master/test/test_validation.py#L466. Your change doesn't affect the entire schema object.

multimeric commented 6 years ago

To summarise, your PR did 4 things:

Fixed a TypeError bug. This sounds good, but needs a test
Added the column name to the IsDtypeValidation message. This seems reasonable enough, and has a test, but the test is in the wrong place
Removed whitespace. This is fine
Fixed test_example.Example.test_example, by changing example.txt (I think...?). This is also good, I'm not sure how I missed this the first time.

kotarf commented 5 years ago

How's this looking? I ran into the same issue with this particular validator.

Easy enough to add a fix locally though. Thx

multimeric commented 5 years ago

Hi @kotarf, if you're encountering the TypeError, could you give me some example code that raises this error? I want this so I can work it in as a test for the library.

kotarf commented 5 years ago

Hi @kotarf, if you're encountering the TypeError, could you give me some example code that raises this error? I want this so I can work it in as a test for the library.

I'll need a bit more time to give you a proper test case but my issue is just that the _geterrors function for IsDtypeValidation does not return a row # or column name. By not having a row # an uncaught exception is thrown when this validation sees failures and the results are sorted in schema.py:

return sorted(errors, key=lambda e: e.row)

My quick fix is to inherit from IsDtypeValidation and override _geterrors to return the row and column name parameters.

multimeric commented 5 years ago

Ohh, that's finally explained this PR to me. I didn't realise the two errors @farridav claimed to fix were actually the same. In that case, all I need to do is move the test already present to the right place and this PR should be fine. You won't need to write your own test case @kotarf, because this PR already has one.

multimeric commented 5 years ago

@farridav, is there a chance you can enable PR editing so I can let you keep attribution but I can tweak some things?

farridav commented 5 years ago

Sure, I've added you as a collaborator, check your invites

multimeric commented 5 years ago

Oh, I just meant PR editing: https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork

farridav commented 5 years ago

Sure, that box appears to already be ticked

multimeric commented 5 years ago

Okay I've edited this PR by:

I made the default ValidationWarning row number -1, so that warnings without rows (e.g. the Dtype warning discussed here) won't have any issues. This is a slightly different solution to the original PR, but I think it's a bit nicer
Moved the new test to the validation tests file
Undid an attempted fix to the example file that was already fixed by #20. I know this PR came first, but I preferred the fix in #20 because it meant producing more user-friendly messages