p2irc / deepplantphenomics

Deep learning for plant phenotyping.
GNU General Public License v2.0
135 stars 46 forks source link

Fix bug in loading custom datasets #19

Closed donovanlavoie closed 5 years ago

donovanlavoie commented 5 years ago

This fixes the bug mentioned in Issue #14 where loading datasets with more images in the directory than specified when loading labels from a .csv file would load all of the images anyway and cause the model to crash before training.

After downloading my own copy of the CVPPP dataset (CVPPP2017 specifically), and running leaf_count_regressor.py with modifications similar to the issue, the error ultimately came from 'split_raw_data' due to a call to Tensorflow's 'DynamicPartition'. Its arguments (a list of image names and a list of partition indices) were supposed to be the same length but weren't since every image name in the directory was in the first list.

The fix is in load_images_with_ids_from_directory, which was separating out the images with loaded ids and labels, but DPPModel's raw_image_files field was being set to not that (after preprocessing), but the full list of all image files from earlier. With that small change, the example successfully moves on to training, which goes smoothly.