multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
190 stars 35 forks source link

Fix Schema Validation when using DtypeValidator #18

Closed farridav closed 5 years ago

farridav commented 6 years ago
melvintl commented 6 years ago

+1 Would fix the issue was encountering

multimeric commented 6 years ago

Fixes dtype validation

Can you write a test for this point? I assume that's what the e.row or 0 change was for, but it doesn't look like that test is for that error

Fixes existing failing test for validation output

Which test was this?

Also I note that you've put your test in test_schema.py instead of test_validation.py. Can you explain why it's a schema-wide test?

farridav commented 6 years ago

@TMiguelT the reason it is a schema unit test is because the error I found was raised in schema.py, and I've made changes to (and assertions for) the output of validation.py

Perhaps I've labeled the pull request wrong, really its fixing schema validation when using a validator who's ValidationError raises do not contain a column, and enriching the respective error messages, I can update PR title to reflect this.

multimeric commented 6 years ago

As I said above, can you write a test demonstrating the previous error? It would be helpful for me to understand this PR, and to have a test to stop it happening again. You seem to have fixed a bug, but not explained what it is, when it happens, or written tests for it.

the reason it is a schema unit test is because the error I found was raised in schema.py, and I've made changes to (and assertions for) the output of validation.py

Since you changed the validation message for a specific validator, I think this should go here, in test_validation.py: https://github.com/TMiguelT/PandasSchema/blob/master/test/test_validation.py#L466. Your change doesn't affect the entire schema object.

multimeric commented 6 years ago

To summarise, your PR did 4 things:

kotarf commented 5 years ago

How's this looking? I ran into the same issue with this particular validator.

Easy enough to add a fix locally though. Thx

multimeric commented 5 years ago

Hi @kotarf, if you're encountering the TypeError, could you give me some example code that raises this error? I want this so I can work it in as a test for the library.

kotarf commented 5 years ago

Hi @kotarf, if you're encountering the TypeError, could you give me some example code that raises this error? I want this so I can work it in as a test for the library.

I'll need a bit more time to give you a proper test case but my issue is just that the _geterrors function for IsDtypeValidation does not return a row # or column name. By not having a row # an uncaught exception is thrown when this validation sees failures and the results are sorted in schema.py:

return sorted(errors, key=lambda e: e.row)

My quick fix is to inherit from IsDtypeValidation and override _geterrors to return the row and column name parameters.

multimeric commented 5 years ago

Ohh, that's finally explained this PR to me. I didn't realise the two errors @farridav claimed to fix were actually the same. In that case, all I need to do is move the test already present to the right place and this PR should be fine. You won't need to write your own test case @kotarf, because this PR already has one.

multimeric commented 5 years ago

@farridav, is there a chance you can enable PR editing so I can let you keep attribution but I can tweak some things?

farridav commented 5 years ago

Sure, I've added you as a collaborator, check your invites

multimeric commented 5 years ago

Oh, I just meant PR editing: https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork

farridav commented 5 years ago

Sure, that box appears to already be ticked

multimeric commented 5 years ago

Okay I've edited this PR by: