neuroinformatics-unit / movement

Python tools for analysing body movements across space and time
http://movement.neuroinformatics.dev
BSD 3-Clause "New" or "Revised" License
77 stars 7 forks source link

Support for multiple views? #196

Open vigji opened 1 month ago

vigji commented 1 month ago

Hi people! Thanks for getting this going, a tool like this is truly needed!

Is your feature request related to a problem? Please describe. I have a dataset with single animals detected from multiple cameras. As far as I can see there is no option for a "views" dimension in the default data format. (I tried to look for discussions of this issue, sorry if I missed something)

Describe the solution you'd like A syntax within the movement framework to load together multiple files referring to different cameras in the same experiment (I don't know what are the long term plans in terms of file structures to load - so I am not necessarily committed to my current organisation)

Describe alternatives you've considered I am currently loading them and concatenating the xarrays after loading (which is totally reasonable, so no rush for this!)

Additional context This kind of data structure would be needed by anyone in the process of extracting 3D poses. This is something needed only as long as you do not have yet 3D coordinates, in my case (but I assume in anyone's) the end goal of having multiple views. So, if you want to support only the final, "clean" data where some triangulation of sort already happened, I would understand!

niksirbi commented 1 month ago

Hi @vigji, thanks for bringing this up! We should have this discussion. It was always on our roadmap to also support 3D poses (see #124), but so far I've always conceptualised it as importing the already triangulated data in 3D (x,y,z) space.

What you are asking for is different, you want to be able to load multiple 2D camera views of the same animal and keypoints, right? I see why this is helpful. For example, that would enable us (or the users) to apply custom triangulation functionalities, or for example, employ clever outlier detection tools that rely on multi-view consistency (see #145). It would be useful to hear what sort of functionalities you have in mind for such data in movement. We've already had some related discussions with @neuroinformatics-unit/behaviour, so would be keen to hear others' opinions on this.

On the technical level, it would be feasible. Probably the "cleanest" would be to add an optional views dimension to the existing xarray objects, as you suggested. This would be the most logically consistent with out data model I think.

niksirbi commented 1 month ago

A question regarding the practicalities:

Are such multi-view predictions typically stored in a single file or as separate files? In the 2nd case the loader would have to accept a list of files as inputs.

An alternative would be to load one view the "usual" way, and then have an add_view() method that would load one additional file/view at a time. This method could also take care of creating the views dimension the first time it was called.

vigji commented 1 month ago

It would be useful to hear what sort of functionalities you have in mind for such data in movement.

My idea would be to stick to movement data representation for a pipeline for coordinate triangulation and (later) filtering. The pipeline would roughly look like: DLC labelling (from multiple camera views, to multiple camera views) -> triangulation (from multiple camera views, to 3D coords).

Probably the "cleanest" would be to add an optional views dimension to the existing xarray objects, as you suggested. This would be the most logically consistent with out data model I think. 💯

Are such multi-view predictions typically stored in a single file or as separate files? In the 2nd case the loader would have to accept a list of files as inputs.

Yes they are! A loader accepting an iterable would be my favourite option, although a dict would allow me to include in the same variable the views name as well. But again, do not want to set my own idiosyncratic constraints here & I'd be happy to comply to anuy solution (also an .add_view() method; that would have the probably annoying step of specifying the view name for the first file created, if by default it is created without view dimensions)

niksirbi commented 1 month ago

Hmm, I like the dict option for passing view_name: filepath pairs. It could look something like this:

def from_multi_view(files_dict: dict[str, Path], source_software: str, fps: float):

Under the hood, this would call load_poses.from_file() for each dict entry, concatenate the loaded datasets along a new views axis, and return the concatenated dataset.

I like this option, because it means that using from_file() will keep working the same way, returning a dataset without a views dim. You only get the extra dim when you explicitly call from_multi_view(), so it won't be a breaking change.

Do you want to take a shot at implementing it @vigji ? You already seem to have relevant test data and some concatenation code that works. I'm also fine with you opening a draft PR with a quick-and-dirty implementation, and we can help you refine it (or we can do the refinement for you if you wish).

With regards to the triangulation, I guess you'd also need to load camera calibration parameters for that to work. If you end up doing something like that with movement, and you think your solution can be sufficiently generalised, we may also be interested in incorporating that as a feature. I know that anipose and sleap-anipose already do this, so we may be duplicating effort, but it could be convenient for users to do everything within the movement data framework (as you seem to be willing).

niksirbi commented 1 month ago

I know that anipose and sleap-anipose already do this, so we may be duplicating effort, but it could be convenient for users to do everything within the movement data framework (as you seem to be willing).

To clarify, if this can be achieved by movement depending os say anipose, that would be the preferred option.

vigji commented 1 month ago

Do you want to take a shot at implementing it @vigji ? You already seem to have relevant test data and some concatenation code that works. I'm also fine with you opening a draft PR with a quick-and-dirty implementation, and we can help you refine it (or we can do the refinement for you if you wish).

Yes, happy to try that! :)

With regards to the triangulation, I guess you'd also need to load camera calibration parameters for that to work. If you end up doing something like that with movement, and you think your solution can be sufficiently generalised, we may also be interested in incorporating that as a feature. I know that anipose and sleap-anipose already do this, so we may be duplicating effort, but it could be convenient for users to do everything within the movement data framework (as you seem to be willing).

I am currently using some triangulation code that people have kindly shared with me, I'll bring the topic up with them and get back on this point as I do not want to appropriate other people's pipeline! Will let you know

niksirbi commented 1 month ago

I am currently using some triangulation code that people have kindly shared with me, I'll bring the topic up with them and get back on this point as I do not want to appropriate other people's pipeline! Will let you know

Of course! There's also no absolute need or rush on this.

adamltyson commented 1 month ago

Is it worth reopening this? Others may want to chime in about this general concept.

vigji commented 1 month ago

Also, I know that this is not the right place for the question, feel free to redirect me: would you consider a native way of dropping the individuals axis for people loading single animal data? (sorry if it exists and I've missed it)

niksirbi commented 1 month ago

Also, I know that this is not the right place for the question, feel free to redirect me: would you consider a native way of dropping the individuals axis mfor people loading single animal data? (sorry if it exists and I've missed it)

There are some xarray-native methods to do that. xarray.Dataset.squeeze() will drop all dimensions with length=1, or if you want to be more explicit, do xarray.Dataset.drop_dims(). Maybe we should make that more prominent in the docs.

niksirbi commented 1 month ago

Also, FYI we have a zulip channel where you are welcome to ask all such questions, or to discuss anything movement-related.