neuroinformatics-unit / movement

Python tools for analysing body movements across space and time
http://movement.neuroinformatics.dev
BSD 3-Clause "New" or "Revised" License
96 stars 8 forks source link

Rethink splitting of individuals in `save_poses.to_dlc_file()` #314

Open niksirbi opened 4 days ago

niksirbi commented 4 days ago

Is your feature request related to a problem? Please describe. The save_poses.to_dlc_file() function (as well as the underlying save_poses.to_dlc_style_df()) accepts a split_individuals argument, which does the following:

If split_individuals is True, each individual will be saved to a separate file, formatted as in a single-animal DeepLabCut project (without the “individuals” column level). The individual’s name will be appended to the file path, just before the file extension, e.g. “/path/to/filename_individual1.h5”. If False, all individuals will be saved to the same file, formatted as in a multi-animal DeepLabCut project (with the “individuals” column level). The file path will not be modified. If “auto”, the argument’s value is determined based on the number of individuals in the dataset: True if there is only one, False otherwise.

This is a bit confusing, because this argument conflates two things:

The way it's currently implemented, these two things are bound together, i.e. splitting always results in the older format, while non-splitting always gives you the newer format.

As @lauraschwarz discovered, there are use-cases in which you'd like these two things to be de-coupled. For example, a user might want to split the data into multiple files (per individual) but use the newer DLC format (e.g. for compatibility with other tools that assume the newer format).

Describe the solution you'd like Have two boolean arguments:

def to_dlc_file(
    ds: xarray.Dataset,
    file_path: str | Path,
    split_individuals: bool = False,
    dlc_df_format: Literal["single-animal", "multi-animal"]  = "multi-animal"`
)

The default behaviour should be to not split and to use the newer "multi-animal" format (which is general enough to cover all cases).

Additional context I'm not sure whether after DLC2.0 all DLC projects (single- or multi-animal) are saved in the newer format or not. Does DLC still use the single-animal dataframe format for single-animal projects?