multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

Distinct across multiple columns #38

Closed Maarten-vd-Sande closed 3 years ago

Maarten-vd-Sande commented 3 years ago

I am not sure if this is supported currently, or how to implement it as custom validator.

I want distinct values across two columns, so that this is okay:

sample    value
1         2
2         2

But this is not:

sample    value
1         2
1         2
multimeric commented 3 years ago

Unfortunately not, the design of pandas_schema 0.X.X is such that every validation is on a per-column basis. This will be fixed in 1.X.X, and indeed I have a demonstration of this behaviour here: https://github.com/TMiguelT/PandasSchema/blob/9452513fbd2f58acc6ca8c3ff94062b07f3f7ffd/test/test_df_validations.py#L50-L61.

But who knows when that will be released., because it's been hard to find the time to finish it.

Maarten-vd-Sande commented 3 years ago

Great this is already in the works! But then I already have a feature request for it to work on a subset of columns :innocent: . Seems like the current implementation does not support this right?

If you want (and if I have time, not too soon), I could start a PR for this

multimeric commented 3 years ago

The current (and future) releases support this: https://tmiguelt.github.io/PandasSchema/#pandas_schema.schema.Schema.validate

Maarten-vd-Sande commented 3 years ago

Ah I see. Just to be sure I understand: I then would use DistinctRowValidation and use validate on a list of columns?

Feel free to close the issue (either now or with the new release). Thanks for all your help :+1:

multimeric commented 3 years ago

Right, so you want unique rows but across a subset of the columns. I don't think you can currently do that in the future release but I'll look into it.

multimeric commented 3 years ago

I'll keep the issue open since it's still not solved in a release.

multimeric commented 3 years ago

Closing in favour of the more general #57 that I just opened.