talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
435 stars 96 forks source link

Test size error for multi-class topdown model #1282

Closed jverpeut closed 1 year ago

jverpeut commented 1 year ago

Bug description

ValueError: test_size=1 should be either positive and smaller than the number of samples 0 or a float in the (0, 1) range

Once this error occurs I have to close all of SLEAP. We are attempting to track 3 mice and have 22 frames labels total. My first thought is that there are not enough frames labeled, but I would like to understand the reason for this error in more detail.

Expected behaviour

Training would finish.

Actual behaviour

Error message

Your personal set up

sleap 1.3.0 personal computer with GPU

Start-up ``` (base) C:\WINDOWS\system32>conda activate sleap1.3 (sleap1.3) C:\WINDOWS\system32>sleap-label Saving config: C:\Users\jverpeut/.sleap/1.3.0/preferences.yaml Restoring GUI state... Software versions: SLEAP: 1.3.0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 Happy SLEAPing! :) ```
Unrelated traceback (`DeleteSelectedInstance`) ``` Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\gui\commands.py", line 553, in deleteSelectedInstance self.execute(DeleteSelectedInstance) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\gui\commands.py", line 242, in execute command().execute(context=self, params=kwargs) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\gui\commands.py", line 139, in execute self.do_with_signal(context, params) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\gui\commands.py", line 163, in do_with_signal cls.do_action(context, params) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\gui\commands.py", line 2474, in do_action context.labels.remove_instance(context.state["labeled_frame"], selected_inst) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\io\dataset.py", line 1323, in remove_instance frame.instances.remove(instance) ValueError: list.remove(x): x not in list ```
Successful top-down (centroid) training ``` Resetting monitor window. Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png Start training centroid... ['sleap-train', 'C:\\Users\\jverpeut\\AppData\\Local\\Temp\\tmpr37lzynp\\230419_132746_training_job.json', 'C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Training profile: C:\Users\jverpeut\AppData\Local\Temp\tmpr37lzynp\230419_132746_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\jverpeut\\AppData\\Local\\Temp\\tmpr37lzynp\\230419_132746_training_job.json", "labels_path": "C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": "C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp", "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": [ 8, 11, 1, 0, 14, 10, 5, 4, 13, 7, 2, 19, 9, 3, 6, 15, 21, 16, 18, 12 ], "validation_inds": [ 17, 20 ], "test_inds": null, "search_path_hints": [ "", "" ], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.5, "pad_to_stride": 16, "resize_and_pad_to_target": true, "target_height": 1088, "target_width": 1456 }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": { "anchor_part": null, "sigma": 2.5, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": 200, "min_batches_per_epoch": 200, "val_batches_per_epoch": 10, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 20 } }, "outputs": { "save_outputs": true, "run_name": "230419_132745.centroid.n=31", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/Users/jverpeut/Desktop\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.0", "filename": "C:\\Users\\jverpeut\\AppData\\Local\\Temp\\tmpr37lzynp\\230419_132746_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 2567 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 28 / Validation = 3. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2023-04-19 13:27:57.206226: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-19 13:27:57.937245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3619 MB memory: -> device: 0, name: Quadro P2200, pci bus id: 0000:b3:00.0, compute capability: 6.1 2023-04-19 13:27:58.814210: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) INFO:sleap.nn.training:Loaded test example. [3.430s] INFO:sleap.nn.training: Input shape: (544, 736, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,105 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CentroidConfmapsHead(anchor_part=None, sigma=2.5, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 272, 368, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 28 INFO:sleap.nn.training:Validation set: n = 3 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31 INFO:sleap.nn.training:Setting up visualization... 2023-04-19 13:28:03.893514: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "Quadro P2200" frequency: 1493 num_cores: 10 environment { key: "architecture" value: "6.1" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1310720 shared_memory_size_per_multiprocessor: 98304 memory_size: 3795648512 bandwidth: 200200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } 2023-04-19 13:28:05.486650: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "Quadro P2200" frequency: 1493 num_cores: 10 environment { key: "architecture" value: "6.1" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1310720 shared_memory_size_per_multiprocessor: 98304 memory_size: 3795648512 bandwidth: 200200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } INFO:sleap.nn.training:Finished trainer set up. [8.6s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [5.4s] INFO:sleap.nn.training:Starting training loop... Epoch 1/200 2023-04-19 13:28:13.216876: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 2023-04-19 13:28:16.697417: W tensorflow/core/common_runtime/bfc_allocator.cc:338] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 200/200 - 70s - loss: 4.5381e-04 - val_loss: 4.5703e-04 2023-04-19 13:29:23.967433: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:24.357389: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:25.109251: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:29.028107: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:29.338813: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:33.900338: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:34.788627: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:34.805575: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:34.806467: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:29:34.809179: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-04-19 13:31:34.100101: W tensorflow/core/kernels/gpu_utils.cc:49] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once. Epoch 2/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 4.5045e-04 - val_loss: 4.5560e-04 Epoch 3/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 4.4688e-04 - val_loss: 4.4625e-04 Epoch 4/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 4.4839e-04 - val_loss: 4.5567e-04 Epoch 5/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 57s - loss: 4.4493e-04 - val_loss: 4.5874e-04 Epoch 6/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 4.4122e-04 - val_loss: 4.5623e-04 Epoch 7/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 4.3153e-04 - val_loss: 4.5450e-04 Epoch 8/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 4.2344e-04 - val_loss: 4.4870e-04 Epoch 00008: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05. Epoch 9/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 4.0219e-04 - val_loss: 4.5203e-04 Epoch 10/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 3.9436e-04 - val_loss: 4.5180e-04 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png Epoch 11/200 200/200 - 58s - loss: 3.7917e-04 - val_loss: 4.2054e-04 Epoch 12/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 3.6662e-04 - val_loss: 4.2748e-04 Epoch 13/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 3.4939e-04 - val_loss: 3.9647e-04 Epoch 14/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 3.3518e-04 - val_loss: 4.2778e-04 Epoch 15/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 3.1607e-04 - val_loss: 4.1903e-04 Epoch 16/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 3.0817e-04 - val_loss: 4.0571e-04 Epoch 17/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 2.8899e-04 - val_loss: 4.3344e-04 Epoch 18/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 2.7890e-04 - val_loss: 4.2888e-04 Epoch 00018: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05. Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png Epoch 19/200 200/200 - 58s - loss: 2.4180e-04 - val_loss: 4.3364e-04 Epoch 20/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 2.2873e-04 - val_loss: 4.0236e-04 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png Epoch 21/200 200/200 - 58s - loss: 2.1771e-04 - val_loss: 4.2498e-04 Epoch 22/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 2.1696e-04 - val_loss: 4.2185e-04 Epoch 23/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 2.0783e-04 - val_loss: 4.4158e-04 Epoch 24/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 1.9939e-04 - val_loss: 4.1644e-04 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png Epoch 25/200 200/200 - 58s - loss: 1.9314e-04 - val_loss: 4.4691e-04 Epoch 00025: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05. Epoch 26/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 58s - loss: 1.7297e-04 - val_loss: 4.3931e-04 Epoch 27/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 1.6557e-04 - val_loss: 4.3326e-04 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png Epoch 28/200 200/200 - 59s - loss: 1.6605e-04 - val_loss: 4.3827e-04 Epoch 29/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 1.6130e-04 - val_loss: 4.3297e-04 Epoch 30/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 1.5728e-04 - val_loss: 4.3038e-04 Epoch 31/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 1.5561e-04 - val_loss: 4.3719e-04 Epoch 32/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 1.5410e-04 - val_loss: 4.3233e-04 Epoch 00032: ReduceLROnPlateau reducing learning rate to 6.24999984211172e-06. Epoch 33/200 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png 200/200 - 59s - loss: 1.4313e-04 - val_loss: 4.4609e-04 Polling: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz\validation.*.png Epoch 00033: early stopping INFO:sleap.nn.training:Finished training loop. [35.1 min] INFO:sleap.nn.training:Deleting visualization directory: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\viz INFO:sleap.nn.training:Saving evaluation metrics to model folder... Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-04-19 14:03:20.481461: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "Quadro P2200" frequency: 1493 num_cores: 10 environment { key: "architecture" value: "6.1" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1310720 shared_memory_size_per_multiprocessor: 98304 memory_size: 3795648512 bandwidth: 200200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-04-19 14:03:20.483273: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2095 num_cores: 40 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 28835840 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 22.9 FPS INFO:sleap.nn.evals:Saved predictions: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\labels_pr.train.slp INFO:sleap.nn.evals:Saved metrics: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\metrics.train.npz INFO:sleap.nn.evals:OKS mAP: 0.486954 Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-04-19 14:03:24.584732: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "Quadro P2200" frequency: 1493 num_cores: 10 environment { key: "architecture" value: "6.1" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1310720 shared_memory_size_per_multiprocessor: 98304 memory_size: 3795648512 bandwidth: 200200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-04-19 14:03:24.586885: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2095 num_cores: 40 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 28835840 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 ? C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\evals.py:506: RuntimeWarning: Mean of empty slice "dist.avg": np.nanmean(dists), C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\evals.py:539: RuntimeWarning: Mean of empty slice. mPCK = mPCK_parts.mean() C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\numpy\core\_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\evals.py:633: RuntimeWarning: Mean of empty slice. pair_pck = metrics["pck.pcks"].mean(axis=-1).mean(axis=-1) C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\numpy\core\_methods.py:163: RuntimeWarning: invalid value encountered in true_divide ret, rcount, out=ret, casting='unsafe', subok=False) C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\evals.py:635: RuntimeWarning: Mean of empty slice. metrics["oks.mOKS"] = pair_oks.mean() WARNING:sleap.nn.evals:Failed to compute metrics. INFO:sleap.nn.evals:Saved predictions: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31\labels_pr.val.slp INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: C:/Users/jverpeut/Desktop\models\230419_132745.centroid.n=31 Finished training centroid. ```
Unsuccessful multiclass top-down (centered-instance) ``` Resetting monitor window. Polling: C:/Users/jverpeut/Desktop\models\230419_140330.multi_class_topdown.n=31\viz\validation.*.png Start training multi_class_topdown... ['sleap-train', 'C:\\Users\\jverpeut\\AppData\\Local\\Temp\\tmpomgh3qmu\\230419_140330_training_job.json', 'C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Training profile: C:\Users\jverpeut\AppData\Local\Temp\tmpomgh3qmu\230419_140330_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\jverpeut\\AppData\\Local\\Temp\\tmpomgh3qmu\\230419_140330_training_job.json", "labels_path": "C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 1.0, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 64, "output_stride": 2, "filters": 64, "filters_rate": 2.0, "middle_block": true, "up_interpolate": false, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": { "confmaps": { "anchor_part": null, "part_names": null, "sigma": 2.5, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "class_vectors": { "classes": null, "num_fc_layers": 3, "num_fc_units": 64, "global_pool": true, "output_stride": 1, "loss_weight": 1.0 } } }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 8, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 100, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-06, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "230419_140330.multi_class_topdown.n=31", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/Users/jverpeut/Desktop\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.0", "filename": "C:\\Users\\jverpeut\\AppData\\Local\\Temp\\tmpomgh3qmu\\230419_140330_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 2885 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\sleap1.3\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.3.0', 'console_scripts', 'sleap-train')()) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\training.py", line 2013, in main trainer = create_trainer_using_cli(args=args) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\training.py", line 2005, in create_trainer_using_cli video_search_paths=args.video_paths, File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\training.py", line 675, in from_config with_track_only=is_id_model, File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\training.py", line 152, in from_config with_track_only=with_track_only, File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\training.py", line 220, in from_labels validation, File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sleap\nn\data\training.py", line 49, in split_labels_train_val idx_train, idx_val = train_test_split(list(range(len(labels))), test_size=n_val) File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sklearn\model_selection\_split.py", line 2421, in train_test_split n_samples, test_size, train_size, default_test_size=0.25 File "C:\ProgramData\Anaconda3\envs\sleap1.3\lib\site-packages\sklearn\model_selection\_split.py", line 2046, in _validate_shuffle_split "(0, 1) range".format(test_size, n_samples) ValueError: test_size=1 should be either positive and smaller than the number of samples 0 or a float in the (0, 1) range Run Path: C:/Users/jverpeut/Desktop\models\230419_140330.multi_class_topdown.n=31 ```
roomrys commented 1 year ago

HI @jverpeut,

For some reason, the by the time we try to set aside a few frames for the validation split, the program thinks that the len(list(range(labels))) == 0 .

We have seen this before when the user had no labeled frames in their project, but that doesn't seem to be the case for you as there was a successful training of the centroid model before the multiclass top-down failed.

Is the problem limited to just the multiclass models (i.e. have you been able to successfully train multiclass)? Also, do you have tracks assigned to each instance for training multiclass (this is a requirement since it is how the classes/tracks are learned).

Thanks, Liezl

jverpeut commented 1 year ago

Liezl,

I believe we do have tracks assigned to each instance, but I am currently having more frames labeled to see if that solves the problem. I cannot seem to use other models based on the way my skeleton and nodes are constructed. Do you have examples of best ways to label nodes other than the ones provided as examples in the update? Those examples do not have enough joints labeled for my application.

Jess

On Thu, Apr 20, 2023 at 10:57 AM Liezl Maree @.***> wrote:

HI @jverpeut https://github.com/jverpeut,

For some reason, the by the time we try to set aside a few frames for the validation split, the program thinks that the len(list(range(labels))) == 0 .

  • The validation split is guaranteed to be at least 1, as we see in test_size=1.
  • It looks like you are invoking the trainer through the GUI and labels are being loaded from C:/Users/jverpeut/Desktop/labels_2_21_DominanceOpenField.v001(1).slp, so the command for running training should be correct.
  • In the naming of the model, SLEAP thinks you have n=31 labeled frames (Run Path: C:/Users/jverpeut/Desktop\models\230419_140330.multi_class_topdown.n=31 ).

We have seen this before when the user had no labeled frames in their project, but that doesn't seem to be the case for you as there was a successful training of a regular top-down model before the multiclass top-down failed.

Is the problem limited to just the multiclass models (i.e. have you been able to successfully train multiclass)? Also, do you have tracks assigned to each instance for training multiclass (this is a requirement since it is how the classes/tracks are learned).

Thanks, Liezl

— Reply to this email directly, view it on GitHub https://github.com/talmolab/sleap/issues/1282#issuecomment-1516729157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4BZHZYNWCVKSF4IGROCSDXCF2HZANCNFSM6AAAAAAXEUKVJ4 . You are receiving this because you were mentioned.Message ID: @.***>

roomrys commented 1 year ago

Hi @jverpeut,

I don't think labeling more frames will help... The structure of the skeleton will determine if you can run a bottom-up model. You should be able to run just a normal top-down model (no multiclass). Do you mind giving this a try and letting me know if you have the same error?

Do you have examples of best ways to label nodes other than the ones provided as examples in the update? Those examples do not have enough joints labeled for my application.

When you say "best way to label nodes", do you mean the skeleton construction (connecting nodes via edges)?

If so, the nodes should be any body part you are interested in tracking. The edges will connect "source" nodes to "destination" nodes. If you would like to use the bottom-up model, then you will need to construct your skeleton s.t. each destination node has only one source node (think of this like a tree where the trunk/source splits into branches/destinations).

Often, as the trunk of this tree, we choose a point that is easy to find throughout the video (such as a central node on the body [e.g. torso]). This is because SLEAP uses the source nodes to help find destination nodes in bottom-up via Part Affinity Fields (PAFs) and we want to choose our top source node as something that is easy to find.

It is better to create a short stubby tree from a few source nodes that are easily found than to create a tall tree with too many source nodes as "losing := being unable to locate" one node in the chain could domino into losing the rest of the chain.

Thanks, Liezl

jverpeut commented 1 year ago

Liezl,

We went back and added 260 labels. Now, we still received an error. I have the output attached.

Jess

On Mon, Apr 24, 2023 at 10:25 AM Liezl Maree @.***> wrote:

Hi @jverpeut https://github.com/jverpeut,

I don't think labeling more frames will help... The structure of the skeleton will determine if you can run a bottom-up model. You should be able to run just a normal top-down model (no multiclass). Do you mind giving this a try and letting me know if you have the same error?

Do you have examples of best ways to label nodes other than the ones provided as examples in the update? Those examples do not have enough joints labeled for my application.

When you say "best way to label nodes", do you mean the skeleton construction (connecting nodes via edges)?

If so, the nodes should be any body part you are interested in tracking. The edges will connect "source" nodes to "destination" nodes. If you would like to use the bottom-up model, then you will need to construct your skeleton s.t. each destination node has only one source node (think of this like a tree where the trunk/source splits into branches/destinations).

Often, as the trunk of this tree, we choose a point that is easy to find throughout the video (such as a central node on the body [e.g. torso]). This is because SLEAP uses the source nodes to help find destination nodes in bottom-up via Part Affinity Fields (PAFs) and we want to choose our top source node as something that is easy to find.

It is better to create a short stubby tree from a few source nodes that are easily found than to create a tall tree with too many source nodes as "losing := being unable to locate" one node in the chain could domino into losing the rest of the chain.

Thanks, Liezl

— Reply to this email directly, view it on GitHub https://github.com/talmolab/sleap/issues/1282#issuecomment-1520559683, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4BZH2DOUDO6GW5L2EAKZDXC2ZSHANCNFSM6AAAAAAXEUKVJ4 . You are receiving this because you were mentioned.Message ID: @.***>

roomrys commented 1 year ago

Hi @jverpeut,

Yeah... adding more labels does not seem like it will help in this case, but now we know for sure!

The output was not sent to the post on github... Do you mind attaching the output directly to your comment on github: https://github.com/talmolab/sleap/issues/1282#issuecomment-1570831728?

You should be able to run just a normal top-down model (no multiclass). Do you mind giving this a try and letting me know if you have the same error?

Thanks, Liezl

jverpeut commented 1 year ago

Still receiving an error with more labels:

Traceback: ``` (base) C:\Users\verpeutlab>conda activate sleap (sleap) C:\Users\verpeutlab>sleap-label Saving config: C:\Users\verpeutlab/.sleap/1.3.0a0/preferences.yaml Restoring GUI state... Software versions: SLEAP: 1.3.0a0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 Happy SLEAPing! :) Resetting monitor window. Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Start training centroid... ['sleap-train', 'C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmp6v6mfcon\\230531_111346_training_job.json', 'C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.0a0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Training profile: C:\Users\VERPEU~1\AppData\Local\Temp\tmp6v6mfcon\230531_111346_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmp6v6mfcon\\230531_111346_training_job.json", "labels_path": "C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": "C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001.slp", "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": [ 4, 15, 7, 13, 14, 12, 3, 11, 18, 19, 9, 20, 17, 5, 1, 10, 0, 8, 6 ], "validation_inds": [ 2, 16 ], "test_inds": null, "search_path_hints": [ "" ], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.4, "pad_to_stride": 16, "resize_and_pad_to_target": true, "target_height": 1088, "target_width": 1456 }, "instance_cropping": { "center_on_part": "tail base", "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": { "anchor_part": "tail base", "sigma": 5.0, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": 200, "min_batches_per_epoch": 200, "val_batches_per_epoch": 10, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 20 } }, "outputs": { "save_outputs": true, "run_name": "230531_111346.centroid.n=260", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/Users/verpeutlab/Desktop/Jaime\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.0a0", "filename": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmp6v6mfcon\\230531_111346_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 7878 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 234 / Validation = 26. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2023-05-31 11:13:51.904622: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-31 11:13:52.905300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6059 MB memory: -> device: 0, name: NVIDIA T1000 8GB, pci bus id: 0000:01:00.0, compute capability: 7.5 2023-05-31 11:13:53.360904: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) INFO:sleap.nn.training:Loaded test example. [2.403s] INFO:sleap.nn.training: Input shape: (448, 592, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,105 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CentroidConfmapsHead(anchor_part='tail base', sigma=5.0, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 224, 296, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 234 INFO:sleap.nn.training:Validation set: n = 26 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260 INFO:sleap.nn.training:Setting up visualization... 2023-05-31 11:13:55.750562: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } 2023-05-31 11:13:56.464500: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } INFO:sleap.nn.training:Finished trainer set up. [4.6s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [12.9s] INFO:sleap.nn.training:Starting training loop... Epoch 1/200 2023-05-31 11:14:11.218893: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0471s vs `on_train_batch_end` time: 0.1095s). Check your callbacks. 200/200 - 50s - loss: 0.0027 - val_loss: 0.0024 2023-05-31 11:15:02.341406: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-05-31 11:15:05.558481: W tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. Epoch 2/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 0.0019 - val_loss: 0.0015 Epoch 3/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 0.0013 - val_loss: 0.0010 Epoch 4/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 0.0010 - val_loss: 9.9984e-04 Epoch 5/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 9.0947e-04 - val_loss: 8.3392e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 6/200 200/200 - 39s - loss: 8.1550e-04 - val_loss: 8.8210e-04 Epoch 7/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 7.5265e-04 - val_loss: 6.0891e-04 Epoch 8/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 39s - loss: 6.8776e-04 - val_loss: 7.4111e-04 Epoch 9/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 6.4899e-04 - val_loss: 6.2401e-04 Epoch 10/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 6.2838e-04 - val_loss: 6.5751e-04 Epoch 11/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 5.6508e-04 - val_loss: 6.1592e-04 Epoch 12/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 5.4734e-04 - val_loss: 6.1357e-04 Epoch 00012: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05. Epoch 13/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 5.1443e-04 - val_loss: 4.9206e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 14/200 200/200 - 40s - loss: 4.9364e-04 - val_loss: 4.0061e-04 Epoch 15/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 4.5666e-04 - val_loss: 4.8143e-04 Epoch 16/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 4.5487e-04 - val_loss: 4.8255e-04 Epoch 17/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 4.5438e-04 - val_loss: 4.6764e-04 Epoch 18/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 4.4009e-04 - val_loss: 4.4030e-04 Epoch 19/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 4.2192e-04 - val_loss: 5.7306e-04 Epoch 00019: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05. Epoch 20/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 4.0998e-04 - val_loss: 4.3399e-04 Epoch 21/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.9489e-04 - val_loss: 5.0619e-04 Epoch 22/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.8638e-04 - val_loss: 3.7483e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 23/200 200/200 - 39s - loss: 3.7188e-04 - val_loss: 4.5849e-04 Epoch 24/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.7672e-04 - val_loss: 3.8249e-04 Epoch 25/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.8465e-04 - val_loss: 3.4616e-04 Epoch 26/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.5204e-04 - val_loss: 4.3523e-04 Epoch 27/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.6266e-04 - val_loss: 4.2876e-04 Epoch 28/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.5119e-04 - val_loss: 4.7368e-04 Epoch 29/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.3528e-04 - val_loss: 4.0810e-04 Epoch 30/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.3721e-04 - val_loss: 3.5008e-04 Epoch 00030: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05. Epoch 31/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.2924e-04 - val_loss: 4.3588e-04 Epoch 32/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.1115e-04 - val_loss: 3.3770e-04 Epoch 33/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.3460e-04 - val_loss: 3.8745e-04 Epoch 34/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.0190e-04 - val_loss: 4.2955e-04 Epoch 35/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.1412e-04 - val_loss: 3.8875e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 36/200 200/200 - 40s - loss: 3.0513e-04 - val_loss: 4.4527e-04 Epoch 37/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.9335e-04 - val_loss: 3.9464e-04 Epoch 00037: ReduceLROnPlateau reducing learning rate to 6.24999984211172e-06. Epoch 38/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.9895e-04 - val_loss: 4.4243e-04 Epoch 39/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.7628e-04 - val_loss: 3.9636e-04 Epoch 40/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 3.0067e-04 - val_loss: 4.2368e-04 Epoch 41/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.8123e-04 - val_loss: 3.9462e-04 Epoch 42/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.8320e-04 - val_loss: 3.8362e-04 Epoch 43/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.8277e-04 - val_loss: 4.2366e-04 Epoch 44/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.7038e-04 - val_loss: 3.5357e-04 Epoch 00044: ReduceLROnPlateau reducing learning rate to 3.12499992105586e-06. Epoch 45/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.7927e-04 - val_loss: 3.4094e-04 Epoch 46/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6728e-04 - val_loss: 3.4959e-04 Epoch 47/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.7730e-04 - val_loss: 3.8007e-04 Epoch 48/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6529e-04 - val_loss: 3.4380e-04 Epoch 49/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6883e-04 - val_loss: 3.7300e-04 Epoch 50/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6422e-04 - val_loss: 3.4992e-04 Epoch 51/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.7132e-04 - val_loss: 3.2378e-04 Epoch 52/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 39s - loss: 2.7212e-04 - val_loss: 4.0469e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 53/200 200/200 - 40s - loss: 2.7185e-04 - val_loss: 3.7823e-04 Epoch 54/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6374e-04 - val_loss: 3.9615e-04 Epoch 55/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.7913e-04 - val_loss: 3.8285e-04 Epoch 56/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5519e-04 - val_loss: 4.0030e-04 Epoch 00056: ReduceLROnPlateau reducing learning rate to 1.56249996052793e-06. Epoch 57/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5868e-04 - val_loss: 4.2701e-04 Epoch 58/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6171e-04 - val_loss: 3.1568e-04 Epoch 59/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6123e-04 - val_loss: 3.0413e-04 Epoch 60/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6580e-04 - val_loss: 3.8219e-04 Epoch 61/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5667e-04 - val_loss: 3.3772e-04 Epoch 62/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6594e-04 - val_loss: 3.9083e-04 Epoch 63/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5205e-04 - val_loss: 3.4997e-04 Epoch 64/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6474e-04 - val_loss: 4.0039e-04 Epoch 00064: ReduceLROnPlateau reducing learning rate to 7.81249980263965e-07. Epoch 65/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5761e-04 - val_loss: 3.4861e-04 Epoch 66/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5804e-04 - val_loss: 3.3884e-04 Epoch 67/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5032e-04 - val_loss: 4.4457e-04 Epoch 68/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5731e-04 - val_loss: 3.2996e-04 Epoch 69/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5961e-04 - val_loss: 3.5084e-04 Epoch 70/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4844e-04 - val_loss: 3.7706e-04 Epoch 71/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4860e-04 - val_loss: 3.4444e-04 Epoch 00071: ReduceLROnPlateau reducing learning rate to 3.906249901319825e-07. Epoch 72/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.6238e-04 - val_loss: 2.7074e-04 Epoch 73/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5187e-04 - val_loss: 4.1309e-04 Epoch 74/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4763e-04 - val_loss: 3.6576e-04 Epoch 75/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4187e-04 - val_loss: 3.4248e-04 Epoch 76/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5130e-04 - val_loss: 4.0028e-04 Epoch 77/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4164e-04 - val_loss: 3.2621e-04 Epoch 78/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4484e-04 - val_loss: 3.0627e-04 Epoch 00078: ReduceLROnPlateau reducing learning rate to 1.9531249506599124e-07. Epoch 79/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5209e-04 - val_loss: 4.0828e-04 Epoch 80/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5570e-04 - val_loss: 3.0577e-04 Epoch 81/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5462e-04 - val_loss: 3.9854e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 82/200 200/200 - 40s - loss: 2.4542e-04 - val_loss: 3.9373e-04 Epoch 83/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4696e-04 - val_loss: 3.8527e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 84/200 200/200 - 40s - loss: 2.4596e-04 - val_loss: 3.5791e-04 Epoch 85/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4950e-04 - val_loss: 4.0792e-04 Epoch 00085: ReduceLROnPlateau reducing learning rate to 9.765624753299562e-08. Epoch 86/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4669e-04 - val_loss: 4.4419e-04 Epoch 87/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5737e-04 - val_loss: 3.2184e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 88/200 200/200 - 40s - loss: 2.5442e-04 - val_loss: 3.9112e-04 Epoch 89/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5875e-04 - val_loss: 3.5122e-04 Epoch 90/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5265e-04 - val_loss: 4.2545e-04 Epoch 91/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.5793e-04 - val_loss: 3.7859e-04 Epoch 92/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png 200/200 - 40s - loss: 2.4154e-04 - val_loss: 4.1934e-04 Epoch 00092: ReduceLROnPlateau reducing learning rate to 4.882812376649781e-08. Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz\validation.*.png Epoch 00092: early stopping INFO:sleap.nn.training:Finished training loop. [62.9 min] INFO:sleap.nn.training:Deleting visualization directory: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\viz INFO:sleap.nn.training:Saving evaluation metrics to model folder... Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-05-31 12:17:05.127650: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-05-31 12:17:05.128536: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 99% ETA: 0:00:01 19.7 FPS2023-05-31 12:17:16.054188: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-05-31 12:17:16.054872: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 7.7 FPS INFO:sleap.nn.evals:Saved predictions: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\labels_pr.train.slp INFO:sleap.nn.evals:Saved metrics: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\metrics.train.npz INFO:sleap.nn.evals:OKS mAP: 0.340353 Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-05-31 12:17:20.182903: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-05-31 12:17:20.183569: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━ 92% ETA: 0:00:01 53.3 FPS2023-05-31 12:17:21.486435: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-05-31 12:17:21.487063: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 17.4 FPS INFO:sleap.nn.evals:Saved predictions: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\labels_pr.val.slp INFO:sleap.nn.evals:Saved metrics: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260\metrics.val.npzINFO:sleap.nn.evals:OKS mAP: 0.351340 INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: C:/Users/verpeutlab/Desktop/Jaime\models\230531_111346.centroid.n=260 Resetting monitor window. Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230531_121737.centered_instance.n=260\viz\validation.*.png Start training centered_instance... ['sleap-train', 'C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmpumuks5l2\\230531_121737_training_job.json', 'C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.0a0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Training profile: C:\Users\VERPEU~1\AppData\Local\Temp\tmpumuks5l2\230531_121737_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmpumuks5l2\\230531_121737_training_job.json", "labels_path": "C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.4, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": "tail base", "crop_size": 384, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 4, "filters": 24, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": { "anchor_part": "tail base", "part_names": null, "sigma": 5.0, "output_stride": 4, "loss_weight": 1.0, "offset_refinement": false }, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "230531_121737.centered_instance.n=260", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/Users/verpeutlab/Desktop/Jaime\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.0a0", "filename": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmpumuks5l2\\230531_121737_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 7857 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 234 / Validation = 26. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2023-05-31 12:17:42.745600: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-31 12:17:43.147962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6059 MB memory: -> device: 0, name: NVIDIA T1000 8GB, pci bus id: 0000:01:00.0, compute capability: 7.5 2023-05-31 12:17:43.503583: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2023-05-31 12:17:44.725948: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } INFO:sleap.nn.training:Loaded test example. [2.124s] INFO:sleap.nn.training: Input shape: (384, 384, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=24, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=2, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 4,311,057 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'R front paw', 'L front paw', 'centroid', 'R rear paw', 'L rear paw', 'tail base', 'tail mid', 'tail tip'], anchor_part='tail base', sigma=5.0, output_stride=4, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 96, 96, 9), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 234 INFO:sleap.nn.training:Validation set: n = 26 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/Users/verpeutlab/Desktop/Jaime\models\230531_121737.centered_instance.n=260 INFO:sleap.nn.training:Setting up visualization... 2023-05-31 12:17:45.614700: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } 2023-05-31 12:17:46.370249: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } INFO:sleap.nn.training:Finished trainer set up. [3.7s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... 2023-05-31 12:17:57.127668: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } 2023-05-31 12:17:59.377459: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } INFO:sleap.nn.training:Finished creating training datasets. [13.2s] INFO:sleap.nn.training:Starting training loop... 2023-05-31 12:17:59.776614: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } Epoch 1/200 2023-05-31 12:18:01.032671: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 2023-05-31 12:18:04.118263: W tensorflow/core/common_runtime/bfc_allocator.cc:338] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0018s vs `on_train_batch_end` time: 0.1156s). Check your callbacks. 2023-05-31 12:18:36.599396: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } 200/200 - 38s - loss: 0.0062 - nose: 0.0063 - R_front_paw: 0.0054 - L_front_paw: 0.0054 - centroid: 0.0060 - R_rear_paw: 0.0059 - L_rear_paw: 0.0060 - tail_base: 0.0064 - tail_mid: 0.0072 - tail_tip: 0.0075 - val_loss: 0.0058 - val_nose: 0.0063 - val_R_front_paw: 0.0052 - val_L_front_paw: 0.0047 - val_centroid: 0.0055 - val_R_rear_paw: 0.0060 - val_L_rear_paw: 0.0053 - val_tail_base: 0.0060 - val_tail_mid: 0.0067 - val_tail_tip: 0.0068 Traceback (most recent call last): File "C:\Users\verpeutlab\miniconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.3.0a0', 'console_scripts', 'sleap-train')()) File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2007, in main trainer.train() File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 943, in train verbose=2, File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 1230, in fit callbacks.on_epoch_end(epoch, epoch_logs) File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\callbacks.py", line 413, in on_epoch_end callback.on_epoch_end(epoch, logs) File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end figure = self.plot_fn() File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1348, in viz_fn=lambda: visualize_example(next(training_viz_ds_iter)), File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1328, in visualize_example preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0)) File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1037, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 2052, in call out = self.keras_model(crops) File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1020, in __call__ input_spec.assert_input_compatibility(self.input_spec, inputs, self.name) File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\input_spec.py", line 269, in assert_input_compatibility ', found shape=' + display_shape(x.shape)) ValueError: Input 0 is incompatible with layer model: expected shape=(None, 384, 384, 1), found shape=(1, 153, 153, 1)INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. ```
talmo commented 1 year ago

Hi @jverpeut,

It looks like you might be on 1.3.0a0 in the latest logs -- I think the bug you're getting at the end should be fixed in 1.3.0 if you want to give that a go!

Talmo

jverpeut commented 1 year ago

Thank you. I will update the software and try again.

jverpeut commented 1 year ago

This time training was able to start, but failed at centered instance:

Start SLEAP ``` (base) C:\Users\verpeutlab>conda activate sleap (sleap) C:\Users\verpeutlab>sleap-label Saving config: C:\Users\verpeutlab/.sleap/1.3.0/preferences.yaml Restoring GUI state... Software versions: SLEAP: 1.3.0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 Happy SLEAPing! :) ```
Set-up training ``` Resetting monitor window. Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png Start training centroid... ['sleap-train', 'C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmp4nhfizay\\230607_112114_training_job.json', 'C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Training profile: C:\Users\VERPEU~1\AppData\Local\Temp\tmp4nhfizay\230607_112114_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmp4nhfizay\\230607_112114_training_job.json", "labels_path": "C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": "C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp", "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": [ 248, 72, 192, 94, 243, 143, 254, 156, 168, 148, 158, 102, 77, 182, 229, 241, 189, 57, 232, 230, 153, 249, 27, 43, 110, 174, 95, 44, 38, 185, 172, 84, 147, 10, 235, 15, 151, 210, 103, 184, 247, 258, 137, 223, 145, 141, 78, 6, 3, 4, 154, 134, 201, 131, 224, 62, 138, 144, 59, 121, 2, 150, 176, 26, 24, 53, 96, 163, 200, 118, 56, 142, 20, 117, 130, 157, 33, 140, 7, 98, 228, 159, 233, 225, 71, 123, 16, 179, 167, 177, 47, 190, 127, 207, 194, 8, 170, 69, 97, 29, 209, 105, 246, 196, 61, 238, 12, 128, 42, 14, 83, 48, 99, 125, 203, 221, 70, 208, 206, 17, 116, 63, 211, 30, 136, 1, 114, 46, 215, 149, 19, 183, 79, 88, 85, 86, 171, 197, 161, 68, 257, 106, 126, 152, 252, 73, 251, 227, 218, 40, 9, 108, 5, 93, 164, 199, 253, 191, 35, 74, 92, 80, 25, 160, 186, 115, 21, 113, 139, 173, 133, 188, 181, 169, 124, 11, 66, 45, 76, 54, 18, 155, 51, 231, 107, 13, 202, 89, 55, 244, 220, 31, 146, 104, 239, 198, 37, 101, 242, 100, 49, 60, 180, 122, 165, 237, 226, 34, 256, 75, 204, 50, 28, 39, 213, 135, 259, 52, 0, 65, 120, 119, 236, 245, 87, 240, 212, 193, 132, 23, 175, 195, 205, 64 ], "validation_inds": [ 58, 214, 112, 219, 178, 217, 111, 32, 67, 255, 187, 81, 162, 216, 109, 41, 222, 91, 36, 82, 129, 250, 22, 166, 234, 90 ], "test_inds": null, "search_path_hints": [ "", "", "" ], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.4, "pad_to_stride": 16, "resize_and_pad_to_target": true, "target_height": 1088, "target_width": 1456 }, "instance_cropping": { "center_on_part": "tail base", "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": { "anchor_part": "tail base", "sigma": 5.0, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": true, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": true, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": 200, "min_batches_per_epoch": 200, "val_batches_per_epoch": 10, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 20 } }, "outputs": { "save_outputs": true, "run_name": "230607_112114.centroid.n=260", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/Users/verpeutlab/Desktop/Jaime\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.0", "filename": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmp4nhfizay\\230607_112114_training_job.json" } ```
Set-up centroid ``` INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 7414 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 234 / Validation = 26. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2023-06-07 11:21:19.800071: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-06-07 11:21:20.180041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6059 MB memory: -> device: 0, name: NVIDIA T1000 8GB, pci bus id: 0000:01:00.0, compute capability: 7.5 2023-06-07 11:21:20.538250: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) INFO:sleap.nn.training:Loaded test example. [1.670s] INFO:sleap.nn.training: Input shape: (448, 592, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,105 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CentroidConfmapsHead(anchor_part='tail base', sigma=5.0, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 224, 296, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 234 INFO:sleap.nn.training:Validation set: n = 26 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260 INFO:sleap.nn.training:Setting up visualization... 2023-06-07 11:21:23.023111: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } 2023-06-07 11:21:23.817999: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } INFO:sleap.nn.training:Finished trainer set up. [4.1s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [12.2s] ```
Training centeroid ``` INFO:sleap.nn.training:Starting training loop... Epoch 1/200 2023-06-07 11:21:37.283716: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0466s vs `on_train_batch_end` time: 0.1076s). Check your callbacks. 200/200 - 49s - loss: 0.0029 - val_loss: 0.0030 Epoch 2/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\train.*.png 200/200 - 41s - loss: 0.0026 - val_loss: 0.0023 Epoch 3/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\train.*.png Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 0.0019 - val_loss: 0.0017 Epoch 4/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 0.0014 - val_loss: 0.0011 Epoch 5/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 0.0012 - val_loss: 0.0010 Epoch 6/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 0.0010 - val_loss: 9.3160e-04 Epoch 7/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 9.4507e-04 - val_loss: 8.9552e-04 Epoch 8/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 8.4740e-04 - val_loss: 7.6910e-04 Epoch 9/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 8.1789e-04 - val_loss: 6.7994e-04 Epoch 10/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 7.7963e-04 - val_loss: 8.0271e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png Epoch 11/200 200/200 - 41s - loss: 7.2489e-04 - val_loss: 7.1172e-04 Epoch 12/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 7.2477e-04 - val_loss: 5.6106e-04 Epoch 13/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 6.7498e-04 - val_loss: 6.3619e-04 Epoch 14/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 6.3793e-04 - val_loss: 4.9243e-04 Epoch 15/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 6.4236e-04 - val_loss: 5.3727e-04 Epoch 16/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 6.1951e-04 - val_loss: 5.6750e-04 Epoch 17/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.9040e-04 - val_loss: 5.8951e-04 Epoch 18/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.9239e-04 - val_loss: 5.4917e-04 Epoch 19/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.6752e-04 - val_loss: 4.6862e-04 Epoch 20/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.8489e-04 - val_loss: 5.2910e-04 Epoch 21/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.6142e-04 - val_loss: 5.0600e-04 Epoch 22/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.4489e-04 - val_loss: 5.5623e-04 Epoch 23/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.4348e-04 - val_loss: 4.5428e-04 Epoch 24/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 5.0491e-04 - val_loss: 3.9201e-04 Epoch 25/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.8160e-04 - val_loss: 5.1183e-04 Epoch 26/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.8964e-04 - val_loss: 4.4636e-04 Epoch 27/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.8693e-04 - val_loss: 3.4735e-04 Epoch 28/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.8617e-04 - val_loss: 4.5551e-04 Epoch 29/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.4201e-04 - val_loss: 4.4345e-04 Epoch 30/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.5606e-04 - val_loss: 4.7202e-04 Epoch 31/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.5639e-04 - val_loss: 4.7744e-04 Epoch 32/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.5492e-04 - val_loss: 4.4995e-04 Epoch 00032: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05. Epoch 33/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.0668e-04 - val_loss: 3.8437e-04 Epoch 34/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.0031e-04 - val_loss: 2.7645e-04 Epoch 35/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.9048e-04 - val_loss: 3.8679e-04 Epoch 36/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 4.0217e-04 - val_loss: 4.2883e-04 Epoch 37/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.7025e-04 - val_loss: 3.9949e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png Epoch 38/200 200/200 - 41s - loss: 3.8837e-04 - val_loss: 3.7571e-04 Epoch 39/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.8293e-04 - val_loss: 3.5809e-04 Epoch 00039: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05. Epoch 40/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.6326e-04 - val_loss: 4.3050e-04 Epoch 41/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.4505e-04 - val_loss: 3.8829e-04 Epoch 42/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.5185e-04 - val_loss: 3.7498e-04 Epoch 43/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.5104e-04 - val_loss: 3.1184e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png Epoch 44/200 200/200 - 41s - loss: 3.5925e-04 - val_loss: 3.6577e-04 Epoch 45/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.4305e-04 - val_loss: 4.1084e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png Epoch 46/200 200/200 - 42s - loss: 3.4130e-04 - val_loss: 3.1685e-04 Epoch 00046: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05. Epoch 47/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.3507e-04 - val_loss: 3.0022e-04 Epoch 48/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.4789e-04 - val_loss: 2.9688e-04 Epoch 49/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.3346e-04 - val_loss: 3.9300e-04 Epoch 50/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.2682e-04 - val_loss: 2.6926e-04 Epoch 51/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1832e-04 - val_loss: 2.7934e-04 Epoch 52/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1841e-04 - val_loss: 3.2562e-04 Epoch 53/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.3640e-04 - val_loss: 3.6434e-04 Epoch 54/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.2489e-04 - val_loss: 3.3068e-04 Epoch 55/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1410e-04 - val_loss: 3.6348e-04 Epoch 00055: ReduceLROnPlateau reducing learning rate to 6.24999984211172e-06. Epoch 56/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0430e-04 - val_loss: 2.9263e-04 Epoch 57/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1340e-04 - val_loss: 3.0825e-04 Epoch 58/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1611e-04 - val_loss: 2.4571e-04 Epoch 59/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0585e-04 - val_loss: 2.6018e-04 Epoch 60/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1720e-04 - val_loss: 2.0955e-04 Epoch 61/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 2.9391e-04 - val_loss: 3.5530e-04 Epoch 62/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0816e-04 - val_loss: 3.2332e-04 Epoch 63/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0417e-04 - val_loss: 2.9338e-04 Epoch 64/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0452e-04 - val_loss: 3.1795e-04 Epoch 65/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1501e-04 - val_loss: 2.5480e-04 Epoch 00065: ReduceLROnPlateau reducing learning rate to 3.12499992105586e-06. Epoch 66/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 2.9867e-04 - val_loss: 2.7039e-04 Epoch 67/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1124e-04 - val_loss: 3.5273e-04 Epoch 68/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0445e-04 - val_loss: 2.8463e-04 Epoch 69/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0281e-04 - val_loss: 2.3984e-04 Epoch 70/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1661e-04 - val_loss: 3.1734e-04 Epoch 71/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 2.9070e-04 - val_loss: 3.8734e-04 Epoch 72/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0281e-04 - val_loss: 2.5271e-04 Epoch 00072: ReduceLROnPlateau reducing learning rate to 1.56249996052793e-06. Epoch 73/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 42s - loss: 3.0964e-04 - val_loss: 3.5069e-04 Epoch 74/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 2.9782e-04 - val_loss: 2.5545e-04 Epoch 75/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0102e-04 - val_loss: 3.1866e-04 Epoch 76/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 2.8965e-04 - val_loss: 3.5894e-04 Epoch 77/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.0548e-04 - val_loss: 2.6746e-04 Epoch 78/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 2.9073e-04 - val_loss: 3.4268e-04 Epoch 79/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 3.1043e-04 - val_loss: 3.2674e-04 Epoch 00079: ReduceLROnPlateau reducing learning rate to 7.81249980263965e-07. Epoch 80/200 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png 200/200 - 41s - loss: 2.9169e-04 - val_loss: 2.6365e-04 Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz\validation.*.png Epoch 00080: early stopping INFO:sleap.nn.training:Finished training loop. [56.8 min] INFO:sleap.nn.training:Deleting visualization directory: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\viz INFO:sleap.nn.training:Saving evaluation metrics to model folder... Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-06-07 12:18:25.091254: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-06-07 12:18:25.091957: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 99% ETA: 0:00:01 19.7 FPS2023-06-07 12:18:36.113344: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-06-07 12:18:36.113997: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 7.7 FPS INFO:sleap.nn.evals:Saved predictions: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\labels_pr.train.slp INFO:sleap.nn.evals:Saved metrics: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\metrics.train.npz INFO:sleap.nn.evals:OKS mAP: 0.352102 Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-06-07 12:18:40.402736: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-06-07 12:18:40.403534: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━ 92% ETA: 0:00:01 55.6 FPS2023-06-07 12:18:41.694246: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA T1000 8GB" frequency: 1395 num_cores: 14 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 65536 memory_size: 6354108416 bandwidth: 160032000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } } 2023-06-07 12:18:41.694949: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -80 } dim { size: -81 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 17.8 FPS INFO:sleap.nn.evals:Saved predictions: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\labels_pr.val.slp INFO:sleap.nn.evals:Saved metrics: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260\metrics.val.npzINFO:sleap.nn.evals:OKS mAP: 0.270398 INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: C:/Users/verpeutlab/Desktop/Jaime\models\230607_112114.centroid.n=260 Finished training centroid. ```
Set-up centered instance ``` Resetting monitor window. Polling: C:/Users/verpeutlab/Desktop/Jaime\models\230607_121843.centered_instance.n=260\viz\validation.*.png Start training centered_instance... ['sleap-train', 'C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmpukbd3tzq\\230607_121844_training_job.json', 'C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.0 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Training profile: C:\Users\VERPEU~1\AppData\Local\Temp\tmpukbd3tzq\230607_121844_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmpukbd3tzq\\230607_121844_training_job.json", "labels_path": "C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.4, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": "tail base", "crop_size": 384, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 4, "filters": 24, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": { "anchor_part": "tail base", "part_names": null, "sigma": 2.5, "output_stride": 4, "loss_weight": 1.0, "offset_refinement": false }, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": true, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": true, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "230607_121843.centered_instance.n=260", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/Users/verpeutlab/Desktop/Jaime\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.0", "filename": "C:\\Users\\VERPEU~1\\AppData\\Local\\Temp\\tmpukbd3tzq\\230607_121844_training_job.json" } ```
Centered-Instance Training ``` INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 7398 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/verpeutlab/Desktop/Jaime/labels_2_21_DominanceOpenField.v001(1).slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 234 / Validation = 26. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2023-06-07 12:18:49.183114: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-06-07 12:18:49.574727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6059 MB memory: -> device: 0, name: NVIDIA T1000 8GB, pci bus id: 0000:01:00.0, compute capability: 7.5 2023-06-07 12:18:49.923094: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2023-06-07 12:18:51.144545: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } INFO:sleap.nn.training:Loaded test example. [2.135s] INFO:sleap.nn.training: Input shape: (384, 384, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=24, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=2, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 4,311,057 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'R front paw', 'L front paw', 'centroid', 'R rear paw', 'L rear paw', 'tail base', 'tail mid', 'tail tip'], anchor_part='tail base', sigma=2.5, output_stride=4, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 96, 96, 9), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 234 INFO:sleap.nn.training:Validation set: n = 26 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/Users/verpeutlab/Desktop/Jaime\models\230607_121843.centered_instance.n=260 INFO:sleap.nn.training:Setting up visualization... 2023-06-07 12:18:52.105196: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } 2023-06-07 12:18:52.847312: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1088 } dim { size: 1456 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } INFO:sleap.nn.training:Finished trainer set up. [3.7s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... 2023-06-07 12:19:03.584500: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } 2023-06-07 12:19:05.781606: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } INFO:sleap.nn.training:Finished creating training datasets. [13.1s] INFO:sleap.nn.training:Starting training loop... 2023-06-07 12:19:06.191977: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } Epoch 1/200 2023-06-07 12:19:07.424168: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 2023-06-07 12:19:10.417550: W tensorflow/core/common_runtime/bfc_allocator.cc:338] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0000s vs `on_train_batch_end` time: 0.1159s). Check your callbacks. 2023-06-07 12:19:43.020432: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 435 } dim { size: 582 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 2995 num_cores: 12 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 49152 l2_cache_size: 1310720 l3_cache_size: 18874368 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 384 } dim { size: 384 } dim { size: 1 } } } 200/200 - 38s - loss: 0.0017 - nose: 0.0017 - R_front_paw: 0.0014 - L_front_paw: 0.0015 - centroid: 0.0017 - R_rear_paw: 0.0017 - L_rear_paw: 0.0016 - tail_base: 0.0018 - tail_mid: 0.0019 - tail_tip: 0.0019 - val_loss: 0.0016 - val_nose: 0.0015 - val_R_front_paw: 0.0013 - val_L_front_paw: 0.0013 - val_centroid: 0.0017 - val_R_rear_paw: 0.0016 - val_L_rear_paw: 0.0017 - val_tail_base: 0.0018 - val_tail_mid: 0.0019 - val_tail_tip: 0.0020 ```

Traceback

Traceback (most recent call last):
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap==1.3.0', 'console_scripts', 'sleap-train')())
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2014, in main
    trainer.train()
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 943, in train
    verbose=2,
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 1230, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\callbacks.py", line 413, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1348, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1328, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1037, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 2071, in call
    out = self.keras_model(crops)
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1020, in __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
  File "C:\Users\verpeutlab\miniconda3\envs\sleap\lib\site-packages\keras\engine\input_spec.py", line 269, in assert_input_compatibility
    ', found shape=' + display_shape(x.shape))
ValueError: Input 0 is incompatible with layer model: expected shape=(None, 384, 384, 1), found shape=(1, 153, 153, 1)
INFO:sleap.nn.callbacks:Closing the reporter controller/context.
INFO:sleap.nn.callbacks:Closing the training controller socket/context.
roomrys commented 1 year ago

Hi @jverpeut,

Can you retrain but keep the input scaling on the centered-instance model at 1.0? For background, this looks very similar to the error discussed here.

Can you try training the centered instance model with an input scaling = 1 (I believe you currently have "input_scaling": 1.75). There is currently an open issue #872 that appears when input scaling is anything other than 1 on the centered instance model. Also a heads up: if you were setting the input scaling to adjust the receptive field size, then we recommend decreasing the max output stride instead of increasing the input scaling past 1. An input scaling greater than 1 will create redundant pixels (no new features) that are passed into the network (and make training take longer).

Thanks, Liezl

jverpeut commented 1 year ago

Liezl,

Changing the input scaling worked. I will close this ticket. Thank you

Jess

jkbhagatio commented 1 year ago

Hi @roomrys , I am also having this same error now when trying to train a "multi-animal top-down id" model. The training of the "centroid" model works fine, but the "centered instance" fails: The input_scaling param is set to 1.0 for both.

INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: C:\Users\jai\ProjectAeon\sleap_playground\social_boys_multiclass_id_topdown\labels.v001.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
Traceback (most recent call last):
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')())
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 2013, in main
    trainer = create_trainer_using_cli(args=args)
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 2005, in create_trainer_using_cli
    video_search_paths=args.video_paths,
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 673, in from_config
    with_track_only=is_id_model,
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 150, in from_config
    with_track_only=with_track_only,
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 218, in from_labels
    validation,
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\data\training.py", line 49, in split_labels_train_val
    idx_train, idx_val = train_test_split(list(range(len(labels))), test_size=n_val)
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sklearn\model_selection\_split.py", line 2423, in train_test_split
    n_samples, test_size, train_size, default_test_size=0.25
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sklearn\model_selection\_split.py", line 2046, in _validate_shuffle_split
    "(0, 1) range".format(test_size, n_samples)
ValueError: test_size=1 should be either positive and smaller than the number of samples 0 or a float in the (0, 1) range

You said this in an earlier comment in this thread:

Is the problem limited to just the multiclass models (i.e. have you been able to successfully train multiclass)? Also, do you have tracks assigned to each instance for training multiclass (this is a requirement since it is how the classes/tracks are learned).

In my case, it indeed works fine with just a "multi-animal top-down" instead of a"multi-animal top-down id" pipeline. When you refer to "assigning tracks" here, what do you mean? I thought the tracker is only implemented during running of Inference, which requires a trained multi_class_topdown model, the training of which is failing for me.

FYI this error occurs for me in both v1.3.1 and v1.3.3

jkbhagatio commented 1 year ago

Oops ok, I realized I had to assign the instances to tracks in the labeling, which I've now done. However now when trying to train the centered-instance (multi_class_topdown) model, I get the following error:

INFO:sleap.nn.training:Loaded test example. [2.827s]
INFO:sleap.nn.training:  Input shape: (128, 128, 1)
Traceback (most recent call last):
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')())
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 2014, in main
    trainer.train()
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 924, in train
    self.setup()
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 910, in setup
    self._setup_model()
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\training.py", line 734, in _setup_model
    self.model.make_model(input_shape)
  File "C:\Users\jai\mambaforge\envs\sleap1.3.3\lib\site-packages\sleap\nn\model.py", line 356, in make_model
    f"Could not find a feature activation for output at stride "
ValueError: Could not find a feature activation for output at stride 1.
jkbhagatio commented 1 year ago

Oooops, and I realized the fix for this last error was just making sure the output_strides for the `"confmaps" and "class_vectors" matched (I set them both to 2 now). So feel free to ignore these comments!