talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
434 stars 96 forks source link

Unable to retrain network on new video #1932

Closed GxHam closed 2 months ago

GxHam commented 2 months ago

Bug description

Error when trying to retain network on new video

Expected behaviour

Run training

Actual behaviour

Error popup

Your personal set up

Software versions: SLEAP: 1.4.1a2 TensorFlow: 2.7.0 Numpy: 1.21.6 Python: 3.7.12 OS: Windows-10-10.0.22621-SP0

Logs ``` # paste relevant logs here, if any Traceback (most recent call last): File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\gui\app.py", line 1245, in _after_plot_change overlay.redraw(self.state["video"], frame_idx) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\gui\overlays\base.py", line 84, in redraw self.add_to_scene(video, frame_idx, *args, **kwargs) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\gui\overlays\tracks.py", line 158, in add_to_scene for track, trails in all_track_trails.items(): AttributeError: 'NoneType' object has no attribute 'items' Traceback (most recent call last): File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\gui\app.py", line 1245, in _after_plot_change overlay.redraw(self.state["video"], frame_idx) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\gui\overlays\base.py", line 84, in redraw self.add_to_scene(video, frame_idx, *args, **kwargs) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\gui\overlays\tracks.py", line 158, in add_to_scene for track, trails in all_track_trails.items(): AttributeError: 'NoneType' object has no attribute 'items' Resetting monitor window. Polling: D:/Data/Gao/Pose_estimation\models\TD_r19_240901_164410.centroid.n=892\viz\validation.*.png Start training centroid... ['sleap-train', 'C:\\Users\\Gao\\AppData\\Local\\Temp\\tmp3g8vfmfy\\240901_164410_training_job.json', 'D:/Data/Gao/Pose_estimation/FirstTest_v2.slp', '--zmq', '--controller_port', '9000', '--publish_port', '9001', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.4.1a2 TensorFlow: 2.7.0 Numpy: 1.21.6 Python: 3.7.12 OS: Windows-10-10.0.22621-SP0 INFO:sleap.nn.training:Training labels file: D:/Data/Gao/Pose_estimation/FirstTest_v2.slp INFO:sleap.nn.training:Training profile: C:\Users\Gao\AppData\Local\Temp\tmp3g8vfmfy\240901_164410_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\Gao\\AppData\\Local\\Temp\\tmp3g8vfmfy\\240901_164410_training_job.json", "labels_path": "D:/Data/Gao/Pose_estimation/FirstTest_v2.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "publish_port": 9001, "controller_port": 9000, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": "D:/Data/Gao/FirstTest_v2.slp", "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": [ 791, 596, 586, 150, 606, 796, 152, 144, 68, 348, 638, 25, 107, 477, 196, 483, 460, 212, 376, 453, 675, 359, 124, 409, 59, 18, 201, 392, 102, 423, 614, 772, 564, 571, 516, 785, 125, 464, 536, 360, 522, 402, 160, 742, 551, 478, 466, 233, 69, 689, 52, 339, 310, 198, 647, 615, 761, 797, 730, 637, 725, 83, 521, 701, 33, 42, 663, 558, 167, 721, 390, 250, 771, 245, 166, 576, 329, 790, 274, 748, 734, 314, 432, 781, 510, 151, 295, 489, 494, 313, 278, 755, 450, 336, 130, 599, 101, 416, 784, 738, 659, 641, 96, 53, 13, 218, 722, 768, 420, 441, 783, 37, 117, 90, 746, 579, 756, 0, 207, 793, 370, 735, 94, 369, 538, 728, 568, 234, 4, 745, 759, 542, 345, 140, 504, 174, 395, 569, 180, 247, 533, 115, 288, 142, 426, 440, 226, 300, 172, 195, 24, 444, 114, 181, 720, 15, 572, 334, 442, 754, 566, 582, 199, 732, 570, 436, 147, 485, 741, 373, 550, 259, 764, 456, 705, 803, 351, 711, 143, 773, 645, 327, 708, 713, 65, 786, 279, 61, 509, 399, 653, 5, 305, 223, 162, 591, 664, 798, 563, 693, 364, 435, 122, 661, 230, 524, 704, 109, 562, 685, 580, 631, 413, 585, 3, 189, 73, 418, 471, 126, 103, 271, 770, 175, 365, 652, 169, 743, 699, 540, 344, 332, 709, 581, 617, 472, 282, 238, 204, 447, 425, 799, 715, 573, 387, 733, 626, 595, 757, 112, 513, 676, 325, 276, 560, 665, 475, 71, 261, 277, 213, 290, 688, 602, 292, 164, 200, 343, 303, 333, 491, 515, 800, 297, 29, 434, 275, 54, 719, 32, 298, 468, 225, 367, 520, 452, 634, 219, 145, 766, 476, 640, 301, 161, 656, 763, 192, 220, 753, 717, 335, 285, 64, 767, 431, 188, 401, 532, 620, 506, 716, 154, 46, 176, 658, 527, 594, 321, 268, 787, 383, 484, 296, 82, 72, 27, 168, 535, 44, 609, 639, 113, 740, 707, 539, 138, 497, 684, 554, 66, 727, 751, 700, 346, 534, 697, 776, 35, 669, 496, 400, 240, 262, 692, 377, 43, 528, 123, 120, 134, 349, 600, 81, 411, 760, 598, 752, 137, 775, 505, 322, 553, 7, 414, 222, 687, 132, 22, 649, 385, 405, 710, 286, 682, 559, 454, 750, 208, 157, 736, 469, 603, 529, 34, 217, 601, 86, 578, 79, 202, 696, 375, 197, 686, 318, 221, 67, 203, 459, 330, 782, 252, 575, 804, 116, 309, 604, 214, 317, 304, 248, 655, 577, 660, 28, 235, 156, 323, 698, 347, 1, 280, 133, 546, 731, 531, 190, 173, 379, 178, 611, 88, 194, 446, 691, 57, 439, 2, 544, 422, 486, 629, 616, 350, 678, 231, 646, 184, 106, 78, 23, 726, 179, 182, 632, 806, 723, 749, 508, 263, 593, 417, 170, 526, 141, 241, 315, 320, 293, 597, 584, 762, 397, 363, 557, 99, 31, 561, 155, 624, 119, 587, 149, 14, 779, 356, 36, 302, 191, 51, 681, 58, 9, 430, 498, 185, 267, 501, 650, 802, 378, 224, 89, 232, 307, 215, 386, 229, 795, 592, 294, 488, 737, 308, 403, 718, 714, 341, 62, 353, 627, 724, 747, 85, 48, 328, 744, 183, 153, 792, 6, 556, 429, 361, 12, 257, 437, 264, 567, 159, 774, 108, 129, 433, 618, 671, 683, 8, 628, 342, 40, 415, 458, 455, 668, 362, 319, 445, 227, 77, 537, 394, 299, 283, 636, 104, 38, 372, 703, 654, 366, 507, 187, 50, 480, 500, 408, 657, 679, 100, 443, 171, 482, 75, 642, 523, 95, 780, 374, 613, 552, 695, 60, 324, 729, 666, 530, 388, 70, 358, 427, 237, 340, 396, 30, 312, 549, 702, 272, 381, 492, 311, 331, 243, 619, 541, 608, 354, 63, 672, 555, 588, 49, 457, 256, 778, 622, 547, 91, 127, 338, 165, 404, 473, 398, 543, 355, 368, 97, 474, 281, 545, 467, 47, 260, 270, 158, 407, 690, 11, 244, 648, 10, 384, 465, 163, 769, 45, 135, 389, 495, 461, 87, 712, 610, 487, 337, 623, 266, 17, 265, 98, 490, 635, 662, 287, 789, 380, 758, 216, 448, 481, 739, 393, 92, 371, 242, 565, 211, 499, 630, 26, 111, 253, 306, 20, 206, 479, 246, 406, 93, 80, 177, 765, 131, 357, 677, 255, 428, 643, 254, 209, 193, 449, 625, 583 ], "validation_inds": [ 412, 251, 148, 16, 673, 19, 136, 228, 612, 421, 56, 210, 284, 128, 517, 410, 514, 438, 503, 39, 651, 589, 186, 110, 525, 605, 424, 105, 258, 706, 470, 84, 205, 794, 249, 269, 680, 670, 236, 139, 511, 382, 694, 121, 667, 41, 326, 518, 633, 502, 512, 55, 451, 352, 674, 805, 621, 74, 291, 644, 574, 607, 316, 118, 239, 519, 801, 289, 493, 462, 463, 788, 777, 548, 391, 419, 590, 146, 273, 21, 76 ], "test_inds": null, "search_path_hints": [ "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "" ], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.5, "pad_to_stride": 16, "resize_and_pad_to_target": true, "target_height": 480, "target_width": 640 }, "instance_cropping": { "center_on_part": "TTI", "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": { "anchor_part": "TTI", "sigma": 2.5, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": "D:\\Data\\Gao\\Pose_estimation\\models\\TD_r18_240711_201243.centroid.n=807" }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": 200, "min_batches_per_epoch": 200, "val_batches_per_epoch": 10, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 20 } }, "outputs": { "save_outputs": true, "run_name": "240901_164410.centroid.n=892", "run_name_prefix": "TD_r19_", "run_name_suffix": "", "runs_folder": "D:/Data/Gao/Pose_estimation\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.4.1a2", "filename": "C:\\Users\\Gao\\AppData\\Local\\Temp\\tmp3g8vfmfy\\240901_164410_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 22761 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initialized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: D:/Data/Gao/Pose_estimation/FirstTest_v2.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 803 / Validation = 89. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2024-09-01 16:44:18.148388: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-01 16:44:18.589325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21340 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:41:00.0, compute capability: 8.9 INFO:sleap.nn.training:Loaded test example. [2.012s] INFO:sleap.nn.training: Input shape: (240, 320, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,105 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CentroidConfmapsHead(anchor_part='TTI', sigma=2.5, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 120, 160, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'") INFO:sleap.nn.training:Loaded previous model weights from D:\Data\Gao\Pose_estimation\models\TD_r18_240711_201243.centroid.n=807\best_model.h5 INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 803 INFO:sleap.nn.training:Validation set: n = 89 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: D:/Data/Gao/Pose_estimation\models\TD_r19_240901_164410.centroid.n=892 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [4.5s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... Traceback (most recent call last): File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.4.1a2', 'console_scripts', 'sleap-train')()) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\training.py", line 2030, in main trainer.train() File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\training.py", line 928, in train training_ds = self.training_pipeline.make_dataset() File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\data\pipelines.py", line 287, in make_dataset ds = transformer.transform_dataset(ds) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\data\dataset_ops.py", line 318, in transform_dataset self.examples = list(iter(ds)) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 800, in __next__ return self._next_internal() File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 786, in _next_internal output_shapes=self._flat_output_shapes) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2844, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\framework\ops.py", line 7107, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape of tensor EagerPyFunc [480,640,3] is not compatible with expected shape [480,640,1]. [[{{node EnsureShape}}]] [Op:IteratorGetNext] 2024-09-01 16:44:23.207636: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: D:/Data/Gao/Pose_estimation\models\TD_r19_240901_164410.centroid.n=892 Resetting monitor window. Polling: D:/Data/Gao/Pose_estimation\models\TD_r19_240901_164427.centered_instance.n=892\viz\validation.*.png Start training centered_instance... ['sleap-train', 'C:\\Users\\Gao\\AppData\\Local\\Temp\\tmpxvt1qdny\\240901_164427_training_job.json', 'D:/Data/Gao/Pose_estimation/FirstTest_v2.slp', '--zmq', '--controller_port', '9000', '--publish_port', '9001', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.4.1a2 TensorFlow: 2.7.0 Numpy: 1.21.6 Python: 3.7.12 OS: Windows-10-10.0.22621-SP0 INFO:sleap.nn.training:Training labels file: D:/Data/Gao/Pose_estimation/FirstTest_v2.slp INFO:sleap.nn.training:Training profile: C:\Users\Gao\AppData\Local\Temp\tmpxvt1qdny\240901_164427_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\Gao\\AppData\\Local\\Temp\\tmpxvt1qdny\\240901_164427_training_job.json", "labels_path": "D:/Data/Gao/Pose_estimation/FirstTest_v2.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "publish_port": 9001, "controller_port": 9000, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": "D:/Data/Gao/FirstTest_v2.slp", "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": [ 149, 119, 85, ... 254, 106, 373, 706, 422, 313, 481, 196, 170, 372, 438, 32, 361 ], "test_inds": null, "search_path_hints": [ "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "" ], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 1.0, "pad_to_stride": 1, "resize_and_pad_to_target": true, "target_height": 480, "target_width": 640 }, "instance_cropping": { "center_on_part": "TTI", "crop_size": 160, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 4, "filters": 24, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": { "anchor_part": "TTI", "part_names": [ "Nose", "Ear_R", "Ear_L", "TTI", "TailTip", "Head", "Trunk", "Tail_0", "Tail_1", "Tail_2", "Shoulder_left", "Shoulder_right", "Haunch_left", "Haunch_right", "Neck" ], "sigma": 2.5, "output_stride": 4, "loss_weight": 1.0, "offset_refinement": false }, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": "D:\\Data\\Gao\\Pose_estimation\\models\\TD_r18_240711_202021.centered_instance.n=807" }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": 200, "min_batches_per_epoch": 200, "val_batches_per_epoch": 10, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "240901_164427.centered_instance.n=892", "run_name_prefix": "TD_r19_", "run_name_suffix": "", "runs_folder": "D:/Data/Gao/Pose_estimation\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.4.1a2", "filename": "C:\\Users\\Gao\\AppData\\Local\\Temp\\tmpxvt1qdny\\240901_164427_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 22698 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initialized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: D:/Data/Gao/Pose_estimation/FirstTest_v2.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 803 / Validation = 89. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2024-09-01 16:44:35.102613: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-01 16:44:35.694934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21340 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:41:00.0, compute capability: 8.9 INFO:sleap.nn.training:Loaded test example. [2.707s] INFO:sleap.nn.training: Input shape: (160, 160, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=24, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=2, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 4,311,639 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['Nose', 'Ear_R', 'Ear_L', 'TTI', 'TailTip', 'Head', 'Trunk', 'Tail_0', 'Tail_1', 'Tail_2', 'Shoulder_left', 'Shoulder_right', 'Haunch_left', 'Haunch_right', 'Neck'], anchor_part='TTI', sigma=2.5, output_stride=4, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 40, 40, 15), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") INFO:sleap.nn.training:Loaded previous model weights from D:\Data\Gao\Pose_estimation\models\TD_r18_240711_202021.centered_instance.n=807\best_model.h5 INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 803 INFO:sleap.nn.training:Validation set: n = 89 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: D:/Data/Gao/Pose_estimation\models\TD_r19_240901_164427.centered_instance.n=892 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [4.4s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... Traceback (most recent call last): File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.4.1a2', 'console_scripts', 'sleap-train')()) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\training.py", line 2030, in main trainer.train() File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\training.py", line 928, in train training_ds = self.training_pipeline.make_dataset() File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\data\pipelines.py", line 287, in make_dataset ds = transformer.transform_dataset(ds) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\sleap\nn\data\dataset_ops.py", line 318, in transform_dataset self.examples = list(iter(ds)) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 800, in __next__ return self._next_internal() File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 786, in _next_internal output_shapes=self._flat_output_shapes) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2844, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "C:\Users\Gao\.conda\envs\sleap_v1.4.1a2\lib\site-packages\tensorflow\python\framework\ops.py", line 7107, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape of tensor EagerPyFunc [480,640,3] is not compatible with expected shape [480,640,1]. [[{{node EnsureShape}}]] [Op:IteratorGetNext] 2024-09-01 16:44:39.849027: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: D:/Data/Gao/Pose_estimation\models\TD_r19_240901_164427.centered_instance.n=892 ```
GxHam commented 2 months ago

Realised the issue was due to uploading new video in RGB when other videos were Mono. Resolved issue by switching all videos to mono