sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
234 stars 32 forks source link

Add invariant validation function #61

Open tomwhite opened 4 years ago

tomwhite commented 4 years ago

We've discussed the need to check that values in dataset arrays conform to certain invariants before or after running methods on them (e.g. https://github.com/pystatgen/sgkit/pull/36#discussion_r452950827). These might be non-trivial in performance time (i.e. not just checking metadata), so would be opt-in by users.

This issue is to add such a function. @eric-czech do you have a suggestion for a function we could start with here?

eric-czech commented 4 years ago

One possibility could be to:

A solution I like a little more would be to:

A a third possibility that I think I like the most at the moment would be to:

Do any of those jump out at you?

tomwhite commented 4 years ago

Thanks for outlining the possibilities @eric-czech. All of these sound like they are lightweight checks of variable names, dtypes, shapes etc - so they could be performed as invariant checks internally before (or after) each operation.

I was thinking more about checks that are not so lightweight, and which might be better exposed as functions that the user could run. E.g. checking that probabilities sum to one. Are there any others?

eric-czech commented 4 years ago

Ah sorry, here are a few of the more expensive checks I think would be common:

I'll add any more I run into, but I think most of those would be quite useful.