Open vigji opened 1 month ago
Hi @vigji, thanks for bringing this up! We should have this discussion. It was always on our roadmap to also support 3D poses (see #124), but so far I've always conceptualised it as importing the already triangulated data in 3D (x,y,z) space.
What you are asking for is different, you want to be able to load multiple 2D camera views of the same animal and keypoints, right? I see why this is helpful. For example, that would enable us (or the users) to apply custom triangulation functionalities, or for example, employ clever outlier detection tools that rely on multi-view consistency (see #145). It would be useful to hear what sort of functionalities you have in mind for such data in movement
.
We've already had some related discussions with @neuroinformatics-unit/behaviour, so would be keen to hear others' opinions on this.
On the technical level, it would be feasible. Probably the "cleanest" would be to add an optional views
dimension to the existing xarray
objects, as you suggested. This would be the most logically consistent with out data model I think.
A question regarding the practicalities:
Are such multi-view predictions typically stored in a single file or as separate files? In the 2nd case the loader would have to accept a list of files as inputs.
An alternative would be to load one view the "usual" way, and then have an add_view()
method that would load one additional file/view at a time. This method could also take care of creating the views
dimension the first time it was called.
It would be useful to hear what sort of functionalities you have in mind for such data in movement.
My idea would be to stick to movement
data representation for a pipeline for coordinate triangulation and (later) filtering. The pipeline would roughly look like:
DLC labelling (from multiple camera views, to multiple camera views) -> triangulation (from multiple camera views, to 3D coords).
Probably the "cleanest" would be to add an optional views dimension to the existing xarray objects, as you suggested. This would be the most logically consistent with out data model I think. 💯
Are such multi-view predictions typically stored in a single file or as separate files? In the 2nd case the loader would have to accept a list of files as inputs.
Yes they are! A loader accepting an iterable would be my favourite option, although a dict
would allow me to include in the same variable the views
name as well. But again, do not want to set my own idiosyncratic constraints here & I'd be happy to comply to anuy solution (also an .add_view()
method; that would have the probably annoying step of specifying the view
name for the first file created, if by default it is created without view
dimensions)
Hmm, I like the dict
option for passing view_name: filepath
pairs. It could look something like this:
def from_multi_view(files_dict: dict[str, Path], source_software: str, fps: float):
Under the hood, this would call load_poses.from_file()
for each dict entry, concatenate the loaded datasets along a new views
axis, and return the concatenated dataset.
I like this option, because it means that using from_file()
will keep working the same way, returning a dataset without a views
dim. You only get the extra dim when you explicitly call from_multi_view()
, so it won't be a breaking change.
Do you want to take a shot at implementing it @vigji ? You already seem to have relevant test data and some concatenation code that works. I'm also fine with you opening a draft PR with a quick-and-dirty implementation, and we can help you refine it (or we can do the refinement for you if you wish).
With regards to the triangulation, I guess you'd also need to load camera calibration parameters for that to work. If you end up doing something like that with movement
, and you think your solution can be sufficiently generalised, we may also be interested in incorporating that as a feature. I know that anipose
and sleap-anipose
already do this, so we may be duplicating effort, but it could be convenient for users to do everything within the movement
data framework (as you seem to be willing).
I know that
anipose
andsleap-anipose
already do this, so we may be duplicating effort, but it could be convenient for users to do everything within themovement
data framework (as you seem to be willing).
To clarify, if this can be achieved by movement
depending os say anipose
, that would be the preferred option.
Do you want to take a shot at implementing it @vigji ? You already seem to have relevant test data and some concatenation code that works. I'm also fine with you opening a draft PR with a quick-and-dirty implementation, and we can help you refine it (or we can do the refinement for you if you wish).
Yes, happy to try that! :)
With regards to the triangulation, I guess you'd also need to load camera calibration parameters for that to work. If you end up doing something like that with movement, and you think your solution can be sufficiently generalised, we may also be interested in incorporating that as a feature. I know that anipose and sleap-anipose already do this, so we may be duplicating effort, but it could be convenient for users to do everything within the movement data framework (as you seem to be willing).
I am currently using some triangulation code that people have kindly shared with me, I'll bring the topic up with them and get back on this point as I do not want to appropriate other people's pipeline! Will let you know
I am currently using some triangulation code that people have kindly shared with me, I'll bring the topic up with them and get back on this point as I do not want to appropriate other people's pipeline! Will let you know
Of course! There's also no absolute need or rush on this.
Is it worth reopening this? Others may want to chime in about this general concept.
Also, I know that this is not the right place for the question, feel free to redirect me: would you consider a native way of dropping the individuals
axis for people loading single animal data? (sorry if it exists and I've missed it)
Also, I know that this is not the right place for the question, feel free to redirect me: would you consider a native way of dropping the
individuals
axis mfor people loading single animal data? (sorry if it exists and I've missed it)
There are some xarray-native methods to do that. xarray.Dataset.squeeze()
will drop all dimensions with length=1, or if you want to be more explicit, do xarray.Dataset.drop_dims()
. Maybe we should make that more prominent in the docs.
Also, FYI we have a zulip channel where you are welcome to ask all such questions, or to discuss anything movement-related.
Hi people! Thanks for getting this going, a tool like this is truly needed!
Is your feature request related to a problem? Please describe. I have a dataset with single animals detected from multiple cameras. As far as I can see there is no option for a "views" dimension in the default data format. (I tried to look for discussions of this issue, sorry if I missed something)
Describe the solution you'd like A syntax within the movement framework to load together multiple files referring to different cameras in the same experiment (I don't know what are the long term plans in terms of file structures to load - so I am not necessarily committed to my current organisation)
Describe alternatives you've considered I am currently loading them and concatenating the
xarray
s after loading (which is totally reasonable, so no rush for this!)Additional context This kind of data structure would be needed by anyone in the process of extracting 3D poses. This is something needed only as long as you do not have yet 3D coordinates, in my case (but I assume in anyone's) the end goal of having multiple views. So, if you want to support only the final, "clean" data where some triangulation of sort already happened, I would understand!