neuroinformatics-unit / movement

Python tools for analysing body movements across space and time
http://movement.neuroinformatics.dev
BSD 3-Clause "New" or "Revised" License
77 stars 7 forks source link

Refactored modules related to input-output #194

Closed niksirbi closed 3 weeks ago

niksirbi commented 1 month ago

Description

What is this PR

Why is this PR needed?

The public functions in the load_poses.py module currently assume that users are always loading data from a file (or from a DeepLabCut-style pandas dataframe). However, there are some use-cases where the data are already in Python, in the form of numpy arrays, perhaps imported with custom loaders (this is not hypothetical, a potential user has already asked for it). There is a way to convert such data into a properly-formatted movement dataset, but this way is not easy to find and is not documented.

What does this PR do?

Adds a from_numpy() function that explicitly accepts position (+ optional confidence) data in the form of numpy arrays and returns a movement dataset. Under the hood it calls the ValidPosesDataset validator and the existing _from_valid_data() utility.

The addition of this function enabled me to slightly refactor the load_poses.py module such that from_numpy() is the single point-of-entry into a movement dataset - i.e. every other loading function first reads data into numpy arrays before calling the new function. This was already de facto the case, but it's much more explicit now. Moreover, this refactoring also enabled me to get read of a redundant validation call for LightningPose data.

Here's the schematic of the updated load_poses.py module. The previous version can be found here.

updated_load_poses

How has this PR been tested?

I added a simple unit test for the new function. The underlying ValidPosesDataset is already extensively tested, and so are all file loaders.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

The API index has been updated accordingly. The new function's docstring also includes example usage.

Checklist:

EDIT 2024-05-31

Following @sfmig review, the scope of this PR expanded, resulting in a more thorough refactoring of IO-related modules: load_poses.py, save_poses.py, and validators.py. This mostly involved renaming functions and editing docstrings, to make the whole thing more logical and internally consistent.

These are the names of the updated public functions: Screenshot 2024-05-31 at 15 49 18

Note that we renamed from_dlc_df to from_dlc_style_df (and likewise for save), because LightningPose also uses "DeepLabCut-style" dataframes.

We also decided to rename private functions such that it's clear what is being converted to what, e.g.: _ds_from_sleap_labels_file() instead of _load_from_sleap_labels.file(). There is one remaining inconsistency, namely the fact that public functions start with from_ while private functions start with _ds_from_ or _df_from_. That's because the way public functions are actually invoked is the following:

from movement.io import load_poses

ds = load_poses.from_file(`file/path')

and load_poses.ds_from_file would be redundant. Perhaps there is scope for renaming load_poses to load_dataset (and save_poses to save_dataset accordingly), such that the syntax would be movement.io.load_dataset.from_file(). That could make more sense now, because "poses" is a bit ambiguous, while we've fully defined what a "dataset" is. I'll open an issue about that.

Here's the updated diagram for movement's I/O functionalites.

movement_io_updated

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.68%. Comparing base (426003c) to head (f99d7d8).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #194 +/- ## ========================================== - Coverage 99.68% 99.68% -0.01% ========================================== Files 11 11 Lines 638 634 -4 ========================================== - Hits 636 632 -4 Misses 2 2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

niksirbi commented 1 month ago

Thanks for the review @sfmig, I like all you suggestions and will implement them here. The scope of this PR will increase to "refactoring load_poses module" and I will edit the PR title and description accordingly.

sonarcloud[bot] commented 1 month ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

niksirbi commented 1 month ago

@sfmig I've updated this PR's description and title. I think there is no need to go line-by-line through the diff again, just let me know whether you agree with the changes as I've described them in the updated PR description.

sfmig commented 3 weeks ago

Looks fantastic @niksirbi 🚀 And also great recap of the renaming bits, thanks! Will merge now