p2irc / deepplantphenomics

Deep learning for plant phenotyping.
GNU General Public License v2.0
135 stars 46 forks source link

Mask Checkpoint Settings and Documentation #44

Closed donovanlavoie closed 5 years ago

donovanlavoie commented 5 years ago

Spurned by the need to either document mask_ckpt.txt or obfuscate it from the user, this adds settings to DPP for controlling the generation and use of the training/testing/validation masks it stores.

A new flag (force_split_partition) and corresponding setter (force_split_shuffle) have been added to control whether or not to always make a new mask instead of loading a previous one; it defaults to the previous state of loading a previous split from an existing mask.

split_raw_data was changed to accommodate it. It gained a new parameter to take in the force_split_partition flag and uses it to determine whether to try and read a pre-existing mask. The code for mask reading and generation, meanwhile, was factored out into get_split_mask for the sake of having separate functions for separate tasks (making a mask vs using it to split data).

Alongside these changes, test were added not just for force_split_shuffle and get_split_mask, but also for the split-defining functions set_test_split and set_validation_split. Some documentation was also added to the leaf counting tutorial to briefly explain DPP's ways of storing and reusing dataset splits for repeatable training.

The test suite w/ additions passes and training for all of the current problem types functions with the changes to dataset splitting.

jubbens commented 5 years ago

Thanks! I like your approach, I just made a few tweaks to the documentation.