talmolab / dreem

DREEM Relates Every Entities' Motion (DREEM). Global Tracking Transformers for biological multi-object tracking.
https://dreem.sleap.ai
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Implement automatic train test splits #71

Open aaprasad opened 4 months ago

aaprasad commented 4 months ago

Right now we require users to specify the training, and validation videos. it would be nice to just have to specify a pool of videos and have dreem-train automatically divide up the the chunks into training and validation

talmo commented 3 months ago

Using sleap-io (docs):

import sleap_io as sio

# Load source labels.
labels = sio.load_file("labels.v001.slp")

# Make splits and export with embedded images.
labels.make_training_splits(n_train=0.8, n_val=0.1, n_test=0.1, save_dir="split1", seed=42)

# Splits will be saved as self-contained SLP package files with images and labels.
labels_train = sio.load_file("split1/train.pkg.slp")
labels_val = sio.load_file("split1/val.pkg.slp")
labels_test = sio.load_file("split1/test.pkg.slp")

Caveats:


One implementation for a higher order data loader would be one that creates a set of sub-clips/segments that are contiguous (maybe with a tolerance for short gaps?).

Basically we want to loop over all labeled frames within Labels and find connected components of frames that are consecutive in time (optionally with a tolerance for gaps of few frames), belong to the same video, and have instances.

Then, the data loader could break up long clips into sub-samples, randomize across these, and natively handle both multi-video (#70), as well as train/val/test splitting.