neuroinformatics-unit / movement

Python tools for analysing body movements across space and time
http://movement.neuroinformatics.dev
BSD 3-Clause "New" or "Revised" License
77 stars 7 forks source link

Need to re-design dataset validators? #210

Open niksirbi opened 3 weeks ago

niksirbi commented 3 weeks ago

The problem

Prior to PR #201 , a "movement dataset" was synonymous to a "poses dataset", because movement only supported pose tracking data. For this reason, we were using the ValidPosesDataset validator everywhere:

Going forward, movement will support bbox-tracking data as well (and perhaps even other types in the future). We still would like the accessor to work both for poses and bboxes, i.e. both of those should still be a movement dataset. But this means we have to fundamentally re-design our validation strategy (we can't keep using the same ValidPosesDataset validator for everything).

Potential solution

Probably we will end up defining several "entities":

The above arrangement has some kinks though. For example, what should we do in cases where only 1 keypoint is being tracked per individual (as in the Aeon dataset, for example). That's not a "pose" strictly speaking, but it can very well be accommodated within a poses dataset with a single keypoint. However, this raises the question of whether a singleton keypoint dimensions should exist in such case, as @vigji has raised, see this issue and this zulip thread). As an alternative we could agree that all point tracking data is "poses", and make the keypoints dimension optional (i.e. a poses dataset is essentially the same as the "base" movement dataset.

Related to the above, I think the individuals dimension, plus any other extra dimensions (like views), should be always optional, i.e. their presence/absence should not be validated by the dataset validators, and they should be only created and validated when and as needed (basically agreeing with what was expressed in the zulip thread).

The question is, can we restructure dataset validation in a way that accommodates something like the above scheme, with the kinks ironed out? I'm fully open to better ideas on this.

vigji commented 3 weeks ago

Just read this thread. As I am trying out locally things to move forward #197 (struggling with the test structure rn), I would probably make sure I do not end up finding solutions for the validator that are then overcome by a redefinition of those classes, what do you think @niksirbi ?

For what matters, I think it makes sense to start as early as possible to allow for dimensions optionality like the keypoints or the individual ones. But I do not know the classes structure in enough detail to really give an insightful opinion!

niksirbi commented 3 weeks ago

Hey @vigji, basically we have two design contraints right now, and both of them hinge on redefining the validators:

  1. accommodate data from bbox tracking as well as pose tracking experiments
  2. allow flexibility in number of dimensions (make many of them optional), which is what you brought up

I think it would be ideal, if the re-designed validators solve both problems in one sweep, especially because they are somewhat inter-related. I agree with you that it's better to tackle such issues early rather than when the project is more mature. This means that the validators + io functions are about to undergo an unstable period till we settle on a new structure that works.

Regarding your experiments in #197, I'd say feel free to continue experimenting on point 2, but don't worry about getting any of the code "camera ready" just yet, because likely we'd have to alter it to match the ongoing changes.

Regarding the structure of tests, is there anything we can do to help? I'd be open to hopping on a quick zoom cal some time next week if that'll help clarify things.

vigji commented 3 weeks ago

Regarding your experiments in https://github.com/neuroinformatics-unit/movement/pull/197, I'd say feel free to continue experimenting on point 2, but don't worry about getting any of the code "camera ready" just yet, because likely we'd have to alter it to match the ongoing changes.

Ok!

Regarding the structure of tests, is there anything we can do to help? I'd be open to hopping on a quick zoom cal some time next week if that'll help clarify things.

I'll dm you on Zulip :)

niksirbi commented 2 weeks ago

@b-peri had a good idea that might help with this:

When we load data, we know what type it is (poses or bboxes), so we could add a dataset attribute (e.g. ds.tracking_type) that keeps that information. Subsequent validation can be done taking into account the value of this attribute. For example:

The more general validation steps (e.g. existence of space and time dimensions) can run independently of the value of this attribute, while more specialised validation will depend on the tracking type.