Rak/random seeds everywhere

rak5216 commented 3 years ago

issues #76 and #80

@CarolinaFurtado, just waiting on @Josh-Joseph approval, then i'll merge with master. tested main concepts locally (rng instances), and trusting you'll verify each step on VM. simple enough, so i didnt test on VM

rak5216 commented 3 years ago

only random cropping:
python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/config_sandbox/dataset-small-3class-randtest.yaml

 Prepare Dataset Metadata:
{'gcp_bucket': 'gs://necstlab-sandbox', 'python_random_global_seed': '1', 'number_of_images': {'validation': 563, 'test': 532, 'train': 572}, 'numpy_random_global_seed': '12', 'created_datetime': '20201021T164621Z', 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'original_config_filename': 'configs/config_sandbox/dataset-small-3class-randtest.yaml', 'elapsed_minutes': 6.2}

random dataset downsampling and cropping:
python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/config_sandbox/dataset-small-3class-randtest2.yaml

 Prepare Dataset Metadata:
{'number_of_images': {'train': 100, 'test': 100, 'validation': 100}, 'python_random_global_seed': '1', 'numpy_random_global_seed': '12', 'gcp_bucket': 'gs://necstlab-sandbox', 'original_config_filename': 'configs/config_sandbox/dataset-small-3class-randtest2.yaml', 'elapsed_minutes': 5.3, 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'created_datetime': '20201021T170049Z'}

random dataset downsampling and cropping (repeat of dataset-small-3class-randtest2.yaml):
python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/config_sandbox/dataset-small-3class-randtest3.yaml

 Prepare Dataset Metadata:
{'gcp_bucket': 'gs://necstlab-sandbox', 'original_config_filename': 'configs/config_sandbox/dataset-small-3class-randtest3.yaml', 'creat
ed_datetime': '20201021T171028Z', 'numpy_random_global_seed': '12', 'number_of_images': {'train': 100, 'test': 100, 'validation': 100}, 
'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'elapsed_minutes': 5.2, 'python_random_global_seed': '1'}

None seed for both random dataset downsampling and cropping:
python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/config_sandbox/dataset-small-3class-randtest4.yaml --python-random-global-seed None --numpy-random-global-seed None

 Prepare Dataset Metadata:
{'number_of_images': {'train': 100, 'test': 100, 'validation': 100}, 'gcp_bucket': 'gs://necstlab-sandbox', 'git_hash': '9c202cf56e34784
adfa323aa34d32319a955fa57', 'python_random_global_seed': 'None', 'numpy_random_global_seed': 'None', 'elapsed_minutes': 5.3, 'created_da
tetime': '20201021T171754Z', 'original_config_filename': 'configs/config_sandbox/dataset-small-3class-randtest4.yaml'}

python3 train_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --config-file configs/config_sandbox/train-small-3class-randtest.yaml --python-random-global-seed 12345 --numpy-random-global-seed 123456 --tf-random-global-seed 1234567

 Train/Val Metadata:
{'elapsed_minutes': 3.7, 'numpy_random_global_seed': '123456', 'gcp_bucket': 'gs://necstlab-sandbox', 'original_config_filename': 'configs/config_sandbox/train-small-3class-randtest.yaml', 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'tf_random_global_seed': '1234567', 'global_threshold_for_metrics': 0.5, 'python_random_global_seed': '12345', 'num_classes': 3, 'created_datetime': '20201021T165207Z', 'dataset_config': {'class_annotation_mapping': {'class_1_annotation_GVs': [175], 'class_0_annotation_GVs': [100], 'class_2_annotation_GVs': [250]}, 'dataset_split': {'train': ['THIN_REF_S2_P1_L3_2496_1563_2159'], 'validation': ['THIN_CNT_S2_P1_L4_2334_1578_2159'], 'test': ['8bit_AS4_S2_P1_L6_2560_1750_2160']}, 'stack_downsampling': {'num_skip_beg_slices': 50, 'type': 'linear', 'number_of_images': 100, 'num_skip_end_slices': 50}, 'image_cropping': {'type': 'class', 'num_per_image': 1, 'min_num_class_pos_px': {'class_0_pos_px': 5, 'class_2_pos_px': 5, 'class_1_pos_px': 5}, 'num_pos_per_class': 1, 'num_neg_per_class': 1}, 'target_size': [512, 512]}, 'target_size': [512, 512]}

python3 train_segmentation_model_prediction_thresholds.py --gcp-bucket gs://necstlab-sandbox --dataset-directory dataset-small-3class-randtest/validation --model-id segmentation-model-small-3class-randtest_20201021T164922Z --batch-size 16 --optimizing-class-metric iou_score --dataset-downsample-factor 0.1 --python-random-global-seed None --numpy-random-global-seed None --tf-random-global-seed None

 Train Prediction Thresholds Metadata:
{'num_classes': 3, 'python_random_global_seed': 'None', 'train_config': {'epochs': 3, 'loss': 'cross_entropy', 'segmentation_model': {'model_parameters': {'backbone_name': 'vgg16', 'encoder_weights': None}, 'model_name': 'Unet'}, 'data_augmentation': {'random_90-degree_rotations': True}, 'dataset_id': 'dataset-small-3class-randtest', 'batch_size': 16, 'model_id_prefix': 'segmentation-model-small-3class-randtest', 'training_data_shuffle_seed': 1234, 'test_data_shuffle_seed': 123456, 'validation_data_shuffle_seed': 12345, 'optimizer': 'adam'}, 'thresholds_training_configuration': {'opt_bounds': [0, 1], 'opt_class_metric': 'iou_score', 'opt_dataset_downsample_factor': 0.1, 'opt_dataset_generator': 'tmp/datasets/dataset-small-3class-randtest/validation', 'opt_method': 'bounded', 'opt_tol': 0.01, 'opt_options': {'maxiter': 500, 'disp': 3}}, 'target_size': [512, 512], 'elapsed_minutes': 2.9, 'tf_random_global_seed': 'None', 'dataset_directory': 'dataset-small-3class-randtest/validation', 'numpy_random_global_seed': 'None', 'gcp_bucket': 'gs://necstlab-sandbox', 'batch_size': 16, 'thresholds_training_output': {'class0': {'fun': 1.0, 'success': True, 'message': 'Solution found.', 'x': 0.9952027292893504, 'status': 0, 'nfev': 11}, 'class2': {'fun': 1.0, 'success': True, 'message': 'Solution found.', 'x': 0.9952027292893504, 'status': 0, 'nfev': 11}, 'class1': {'fun': 0.9994177207699977, 'success': True, 'message': 'Solution found.', 'x': 0.12687915751060486, 'status': 0, 'nfev': 10}}, 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'created_datetime': '20201021T172616Z', 'model_id': 'segmentation-model-small-3class-randtest_20201021T164922Z', 'dataset_config': {'image_cropping': {'num_neg_per_class': 1, 'num_per_image': 1, 'min_num_class_pos_px': {'class_1_pos_px': 5, 'class_0_pos_px': 5, 'class_2_pos_px': 5}, 'type': 'class', 'num_pos_per_class': 1}, 'target_size': [512, 512], 'stack_downsampling': {'num_skip_beg_slices': 50, 'num_skip_end_slices': 50, 'type': 'linear', 'number_of_images': 100}, 'dataset_split': {'test': ['8bit_AS4_S2_P1_L6_2560_1750_2160'], 'validation': ['THIN_CNT_S2_P1_L4_2334_1578_2159'], 'train': ['THIN_REF_S2_P1_L3_2496_1563_2159']}, 'class_annotation_mapping': {'class_0_annotation_GVs': [100], 'class_2_annotation_GVs': [250], 'class_1_annotation_GVs': [175]}}, 'thresholds_training_history': {'class0': {'6_threshold_metric': [0.9655581462513669, 0.0], '7_threshold_metric': [0.9787137637477917, 0.0], '2_threshold_metric': [0.7639320225002102, 0.0], '3_threshold_metric': [0.8541019662496845, 0.0], '10_threshold_metric': [0.9952027292893504, 0.0], '1_threshold_metric': [0.6180339887498948, 0.0], '4_threshold_metric': [0.9098300562505257, 0.0], '8_threshold_metric': [0.9868443825035751, 0.0], '9_threshold_metric': [0.9918693812442166, 0.0], '5_threshold_metric': [0.9442719099991588, 0.0], '0_threshold_metric': [0.3819660112501051, 0.0]}, 'class2': {'6_threshold_metric': [0.9655581462513669, 0.0], '7_threshold_metric': [0.9787137637477917, 0.0], '2_threshold_metric': [0.7639320225002102, 0.0], '3_threshold_metric': [0.8541019662496845, 0.0], '10_threshold_metric': [0.9952027292893504, 0.0], '1_threshold_metric': [0.6180339887498948, 0.0], '4_threshold_metric': [0.9098300562505257, 0.0], '8_threshold_metric': [0.9868443825035751, 0.0], '9_threshold_metric': [0.9918693812442166, 0.0], '5_threshold_metric': [0.9442719099991588, 0.0], '0_threshold_metric': [0.3819660112501051, 0.0]}, 'class1': {'6_threshold_metric': [0.11843636484221466, 0.00024167141236830503], '7_threshold_metric': [0.1320970903396384, 0.0005146731855347753], '2_threshold_metric': [0.2360679774997897, 0.0002393372997175902], '3_threshold_metric': [0.14589803375031546, 0.00029822776559740305], '1_threshold_metric': [0.6180339887498948, 0.0], '4_threshold_metric': [0.09016994374947425, 0.0002819890505634248], '8_threshold_metric': [0.12687915751060486, 0.0005822792300023139], '9_threshold_metric': [0.1235458222953495, 0.0004798975423909724], '5_threshold_metric': [0.14053988300802858, 0.00039439054671674967], '0_threshold_metric': [0.3819660112501051, 8.005763811524957e-05]}}}

python3 test_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --dataset-id dataset-small-3class-randtest --model-id segmentation-model-small-3class-randtest_20201021T164922Z --batch-size 16 --trained-thresholds-id model_thresholds_20201021T172324Z.yaml

 Test Metadata:
{'elapsed_minutes': 1.4, 'gcp_bucket': 'gs://necstlab-sandbox', 'default_global_threshold_for_reference': 0.5, 'numpy_random_global_seed': '12', 'tf_random_global_seed': '123', 'python_random_global_seed': '1', 'dataset_config': {'dataset_split': {'train': ['THIN_REF_S2_P1_L3_2496_1563_2159'], 'validation': ['THIN_CNT_S2_P1_L4_2334_1578_2159'], 'test': ['8bit_AS4_S2_P1_L6_2560_1750_2160']}, 'target_size': [512, 512], 'image_cropping': {'num_neg_per_class': 1, 'min_num_class_pos_px': {'class_0_pos_px': 5, 'class_1_pos_px': 5, 'class_2_pos_px': 5}, 'num_per_image': 1, 'num_pos_per_class': 1, 'type': 'class'}, 'stack_downsampling': {'num_skip_beg_slices': 50, 'num_skip_end_slices': 50, 'number_of_images': 100, 'type': 'linear'}, 'class_annotation_mapping': {'class_2_annotation_GVs': [250], 'class_0_annotation_GVs': [100], 'class_1_annotation_GVs': [175]}}, 'trained_thresholds_id': 'model_thresholds_20201021T172324Z.yaml', 'batch_size': 16, 'dataset_id': 'dataset-small-3class-randtest', 'trained_class_thresholds_loaded': {'class0': 0.9952027292893504, 'class1': 0.12687915751060486, 'class2': 0.9952027292893504}, 'train_config': {'optimizer': 'adam', 'training_data_shuffle_seed': 1234, 'loss': 'cross_entropy', 'batch_size': 16, 'dataset_id': 'dataset-small-3class-randtest', 'epochs': 3, 'test_data_shuffle_seed': 123456, 'validation_data_shuffle_seed': 12345, 'segmentation_model': {'model_name': 'Unet', 'model_parameters': {'encoder_weights': None, 'backbone_name': 'vgg16'}}, 'data_augmentation': {'random_90-degree_rotations': True}, 'model_id_prefix': 'segmentation-model-small-3class-randtest'}, 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'created_datetime': '20201021T172900Z', 'model_id': 'segmentation-model-small-3class-randtest_20201021T164922Z'}

python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --stack-id 8bit_AS4_S2_P1_L6_2560_1750_2160 --model-id segmentation-model-small-3class-randtest_20201021T164922Z --image-ids 8bit_AS4_S2_P1_L6_2560_1750_2160-2089.tif --labels-output False --pad-output False --trained-thresholds-id model_thresholds_20201021T172324Z.yaml  --python-random-global-seed 12345 --numpy-random-global-seed 123456 --tf-random-global-seed 1234567

Infer Metadata:
{'default_global_threshold_for_reference': 0.5, 'model_id': 'segmentation-model-small-3class-randtest_20201021T164922Z', 'created_datetime': '20201021T173140Z', 'prediction_thresholds_used': [0.9952027292893504, 0.12687915751060486, 0.9952027292893504], 'tf_random_global_seed': '1234567', 'stack_id': '8bit_AS4_S2_P1_L6_2560_1750_2160', 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'pad_output': False, 'numpy_random_global_seed': '123456', 'background_class_index': None, 'elapsed_minutes': 1.9, 'labels_output': False, 'user_specified_prediction_thresholds': None, 'image_ids': '8bit_AS4_S2_P1_L6_2560_1750_2160-2089.tif', 'python_random_global_seed': '12345', 'trained_thresholds_id': 'model_thresholds_20201021T172324Z.yaml', 'gcp_bucket': 'gs://necstlab-sandbox', 'trained_class_thresholds_loaded': {'class2': 0.9952027292893504, 'class0': 0.9952027292893504, 'class1': 0.12687915751060486}}

rak5216 commented 3 years ago

@Josh-Joseph

global random seed functionality all moved to command line (input string convertible to int or exactly input None)
dataset generator seeds (instances) left in train config
reproducilibity verified for dataset prep, and seeding syntax repeated and verified in other workflows

rak5216 commented 3 years ago

@Josh-Joseph revisions finished

rak5216 commented 3 years ago

None seed for both random dataset downsampling and cropping:

python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/config_sandbox/dataset-small-3class-randtest5.yaml

 Prepare Dataset Metadata:
{'number_of_images': {'train': 100, 'validation': 100, 'test': 100}, 'original_config_filename': 'configs/config_sandbox/dataset-small-3class-randtest5.yaml', 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'numpy_random_global_seed': None, 'random-module-global-seed': None, 'created_datetime': '20201022T153538Z', 'gcp_bucket': 'gs://necstlab-sandbox', 'elapsed_minutes': 5.2}

random dataset downsampling and cropping (repeat of dataset-small-3class-randtest2.yaml):

python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/config_sandbox/dataset-small-3class-randtest6.yaml --random-module-global-seed 1 --numpy-random-global-seed 12

 Prepare Dataset Metadata:
{'random-module-global-seed': 1, 'numpy_random_global_seed': 12, 'created_datetime': '20201022T154129Z', 'elapsed_minutes': 5.2, 'number_of_images': {'train': 100, 'test': 100, 'validation': 100}, 'git_hash': '9c202cf56e34784adfa323aa34d32319a955fa57', 'original_config_filename': 'configs/config_sandbox/dataset-small-3class-randtest6.yaml', 'gcp_bucket': 'gs://necstlab-sandbox'}

mit-quest / necstlab-damage-segmentation

Rak/random seeds everywhere #81