neuroinformatics-unit / movement

Python tools for analysing body movements across space and time
http://movement.neuroinformatics.dev
BSD 3-Clause "New" or "Revised" License
77 stars 7 forks source link

Refactor `filtering` module to take DataArrays as input #209

Open lochhh opened 3 weeks ago

lochhh commented 3 weeks ago

Description

What is this PR

Why is this PR needed? This PR closes #191

What does this PR do?

  1. This PR rewrites the filtering module to accept DataArrays (instead of Datasets) as inputs.
  2. A major change from the previous Dataset median_filter and savgol_filter is the window_length (renamed as window to better align with xarray and pandas). Before this PR, the unit of the window is determined based on ds.time_unit. If the time_unit is in seconds, the window supplied will be treated as seconds and converted to frames using window = int(window * ds.fps). Now that we no longer have access to the ds.time_unit and ds.fps attributes, the definition of window is changed to "The size of the filter window, representing the fixed number of observations used for each window". This appears to be quite awkward, especially in the examples, where the windows are defined in "seconds". So there is the extra step to convert seconds to frames when supplying window to the filters.
  3. The max_gap argument of interpolate_over_time function, on the other hand, takes care of 'seconds' vs 'frames', as we are directly supplying the time dimension to xarray.interpolate_na. Whilst neat, this behaviour is inconsistent with the aforementioned filters, which only considers number of observations/frames.
  4. The filter_and_interpolate and smooth examples have been updated. (Note point 2.)
  5. Existing tests for the overwritten filters are also adapted for the new ones. These new unit tests use the simpler valid_poses_dataset and valid_poses_dataset_with_nan in place of the sample_dataset fixture (now removed).
  6. The helper functions to count NaNs in conftest.py are simplified to calculate NaNs in the entire data array, rather than for a single dimension of a single keypoint from a single individual.
  7. The report_nan_values function has also been adapted to report NaN stats for a single DataArray - note that this doesn't yet account for "single" individual or "single" keypoint cases (e.g. ds.position.sel(individuals="ind1"), ds.position.sel(keypoints="snout"). Hence, in the filter_and_interpolate example, ds.position.sel(individuals="ind1") is dropped, which also means we need to squeeze() before plotting.
  8. Add accessor methods for filters (see summary)
  9. To enable 8. new <module>_wrappers have been added to handle forwarding accessor method calls to the respective modules.

References

191

How has this PR been tested?

Tests were written (and then removed) to first compare equality of the outputs from the initial Dataset filters and the newly added DataArray filters. Existing tests for the overwritten filters are then adapted for the new ones.

Is this a breaking change?

Yes. See points 1 and 2 above.

Does this PR require an update to the documentation?

Affected examples have been updated. Accessor methods are also added to api.rst

TO-DO

Checklist:

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.69%. Comparing base (a1c91e6) to head (157684b).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #209 +/- ## ======================================= Coverage 99.68% 99.69% ======================================= Files 11 11 Lines 639 651 +12 ======================================= + Hits 637 649 +12 Misses 2 2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sonarcloud[bot] commented 3 weeks ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud