talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
435 stars 96 forks source link

Centered instance model scales input image (not cropped image) leading to error #872

Open talmo opened 2 years ago

talmo commented 2 years ago

I think the problem is that we generally expect an input scaling of 1.0 for centered instance models since they're crops already. The training does handle this appropriately, but not the visualization for some reason (it's probably missing the input scaling transformer/preprocessing).

In general, I think we can solve this by switching to using the InferenceModel classes to generate visualizations so that we're not doing some custom inference routines inside of Trainer classes.

Here's the relevant error:

  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1328, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1308, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1037, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 1722, in call
    out = self.keras_model(crops)
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1020, in __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\input_spec.py", line 269, in assert_input_compatibility
    ', found shape=' + display_shape(x.shape))
ValueError: Input 0 is incompatible with layer model: expected shape=(None, 128, 128, 3), found shape=(1, 32, 32, 3)

See issue below for more.

Discussed in https://github.com/talmolab/sleap/discussions/871

Originally posted by **Shifulai** July 29, 2022 Thank for your attention. When I try to train the top-down centered instance model, the training cannot work when the input scaling is not 1.0. The train will stay at epoch1 but the runtime still add. ![1](https://user-images.githubusercontent.com/103489092/181755318-151c5d7d-fc74-4e3e-9637-09fe049b8872.png) ![2](https://user-images.githubusercontent.com/103489092/181755336-943f000d-c597-42ac-b013-cb0513b765b8.png) Bug report below --------------------------------------------------------------------------------------------------------------------------------------------- ``` Software versions: SLEAP: 1.2.6 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 Happy SLEAPing! :) Using already trained model for centroid: D:/Desktop/CK/sleap/data\models\220729_134535.centroid.n=765\training_config.json Resetting monitor window. Polling: D:/Desktop/CK/sleap/data\models\220729_194813.centered_instance.n=765\viz\validation.*.png Start training centered_instance... ['sleap-train', 'C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json', 'D:/Desktop/CK/sleap/data/food competition.slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.2.6 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: D:/Desktop/CK/sleap/data/food competition.slp INFO:sleap.nn.training:Training profile: C:\Users\admin\AppData\Local\Temp\tmp1aqtnvzl\220729_194813_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json", "labels_path": "D:/Desktop/CK/sleap/data/food competition.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": 0 } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.25, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": "tail", "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 8, "filters": 16, "filters_rate": 1.5, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": { "anchor_part": "tail", "part_names": null, "sigma": 2.5, "output_stride": 8, "loss_weight": 1.0, "offset_refinement": false }, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null } }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": true, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": true, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": true }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 20 } }, "outputs": { "save_outputs": true, "run_name": "220729_194813.centered_instance.n=765", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "D:/Desktop/CK/sleap/data\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.2.6", "filename": "C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: D:/Desktop/CK/sleap/data/food competition.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 689 / Validation = 76. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2022-07-29 19:48:31.149710: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-07-29 19:48:33.014129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3489 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6 2022-07-29 19:48:35.801343: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2022-07-29 19:48:47.065434: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } INFO:sleap.nn.training:Loaded test example. [22.266s] INFO:sleap.nn.training: Input shape: (128, 128, 3) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=1.5, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=1, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 265,575 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'hear_r', 'hear_l', 'tail'], anchor_part='tail', sigma=2.5, output_stride=8, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 16, 16, 4), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 689 INFO:sleap.nn.training:Validation set: n = 76 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: D:/Desktop/CK/sleap/data\models\220729_194813.centered_instance.n=765 INFO:sleap.nn.training:Setting up visualization... 2022-07-29 19:48:59.507634: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1920 } dim { size: 1080 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } 2022-07-29 19:49:07.684222: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1920 } dim { size: 1080 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } INFO:sleap.nn.training:Finished trainer set up. [41.8s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... 2022-07-29 19:55:14.233259: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } 2022-07-29 19:55:31.551806: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } INFO:sleap.nn.training:Finished creating training datasets. [384.0s] INFO:sleap.nn.training:Starting training loop... 2022-07-29 19:55:32.101723: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } Epoch 1/200 2022-07-29 19:55:33.962928: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0000s vs `on_train_batch_end` time: 0.0156s). Check your callbacks. 2022-07-29 19:55:58.738032: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } 344/344 - 30s - loss: 0.0325 - nose: 0.0373 - hear_r: 0.0405 - hear_l: 0.0394 - tail: 0.0128 - val_loss: 0.0242 - val_nose: 0.0294 - val_hear_r: 0.0325 - val_hear_l: 0.0307 - val_tail: 0.0041 Traceback (most recent call last): File "D:\anaconda\envs\sleap\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.2.6', 'console_scripts', 'sleap-train')()) File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1955, in main trainer.train() File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 923, in train verbose=2, File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\training.py", line 1230, in fit callbacks.on_epoch_end(epoch, epoch_logs) File "D:\anaconda\envs\sleap\lib\site-packages\keras\callbacks.py", line 413, in on_epoch_end callback.on_epoch_end(epoch, logs) File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end figure = self.plot_fn() File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1328, in viz_fn=lambda: visualize_example(next(training_viz_ds_iter)), File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1308, in visualize_example preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0)) File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1037, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 1722, in call out = self.keras_model(crops) File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1020, in __call__ input_spec.assert_input_compatibility(self.input_spec, inputs, self.name) File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\input_spec.py", line 269, in assert_input_compatibility ', found shape=' + display_shape(x.shape)) ValueError: Input 0 is incompatible with layer model: expected shape=(None, 128, 128, 3), found shape=(1, 32, 32, 3) INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. ```
roomrys commented 2 years ago

Can we just use the output stride of the centroid model to do a limited version of input scaling on the centered instance model (finite set of feasible stride values)? -> This would couple the centroid and centered instance models though, which we might not want.

Problem Analysis

It seems that the keras_model used expects an input shape same as the pre-scaled input. We should initialize the keras model to expect the scaled input shape.

Relevant Code

  1. Set up the keras model https://github.com/talmolab/sleap/blob/c4409ddeee6626206b03ee568c5cc8879a5bec2c/sleap/nn/training.py#L721-L736

  2. Make the pipeline using the preprocessing from self.data_config. Note that the Resizer is resizing the original uncropped image. To resize the cropped image, we should move the Resizer after the InstanceCropper transform. https://github.com/talmolab/sleap/blob/c4409ddeee6626206b03ee568c5cc8879a5bec2c/sleap/nn/data/pipelines.py#L657-L680

  3. The visualization pipeline for TopDown should use self.make_base_pipeline instead of rewriting everything (also the base pipeline includes the Resizer) https://github.com/talmolab/sleap/blob/c4409ddeee6626206b03ee568c5cc8879a5bec2c/sleap/nn/data/pipelines.py#L760-L785

Follow-up problems

  1. After moving the Resizer after the InstanceCropper, we also need a way of passing in the points_keys from SizeMatcher to Resizer.
  2. We need to make some changes to #841 s.t. the crop_size * scale_size must be divisible by the max stride for the TopDown (centered instance).

Traceback

Traceback (most recent call last):
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_convert-naming\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap', 'console_scripts', 'sleap-train')())
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 1955, in main    trainer.train()
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 923, in train    verbose=2,
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_convert-naming\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 1328, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 1308, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\inference.py", line 1723, in call
    out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 224, 224, 1), found shape=(1, 168, 168, 1)       

Call arguments received:
  • inputs=tf.Tensor(shape=(1, 224, 224, 1), dtype=float32)
amblypatty commented 1 year ago

I too think I am now experiencing this issue; however, I am not sure why it is coming up for me now during training when I have been training on the same model for weeks. I have tried reinstalling SLEAP v1.2.9 and paying Google Colab for more compute capability (per discussion #871). Below is the dialogue, which appears similar to what @talmo posted, but the TF error comes up at the INFO:sleap.nn.training:Building test pipeline... before the visualization set up. Note that I also disabled visualizations from the Run Training dialogue, yet the issue still came up:

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:sleap.nn.training:Versions:
SLEAP: 1.2.9
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.8.10
OS: Linux-5.10.147+-x86_64-with-glibc2.29
INFO:sleap.nn.training:Training labels file: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Training profile: centered_instance.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
    "training_job_path": "centered_instance.json",
    "labels_path": "resolved_skeletons_with_predictions.pkg.slp",
    "video_paths": [
        ""
    ],
    "val_labels": null,
    "test_labels": null,
    "tensorboard": false,
    "save_viz": false,
    "zmq": false,
    "run_name": "",
    "prefix": "",
    "suffix": "",
    "cpu": false,
    "first_gpu": false,
    "last_gpu": false,
    "gpu": "auto"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
    "data": {
        "labels": {
            "training_labels": null,
            "validation_labels": null,
            "validation_fraction": 0.1,
            "test_labels": null,
            "split_by_inds": false,
            "training_inds": null,
            "validation_inds": null,
            "test_inds": null,
            "search_path_hints": [],
            "skeletons": []
        },
        "preprocessing": {
            "ensure_rgb": false,
            "ensure_grayscale": false,
            "imagenet_mode": null,
            "input_scaling": 1.0,
            "pad_to_stride": null,
            "resize_and_pad_to_target": true,
            "target_height": null,
            "target_width": null
        },
        "instance_cropping": {
            "center_on_part": "pedicel",
            "crop_size": null,
            "crop_size_detection_padding": 16
        }
    },
    "model": {
        "backbone": {
            "leap": null,
            "unet": {
                "stem_stride": null,
                "max_stride": 32,
                "output_stride": 4,
                "filters": 48,
                "filters_rate": 2.0,
                "middle_block": true,
                "up_interpolate": true,
                "stacks": 1
            },
            "hourglass": null,
            "resnet": null,
            "pretrained_encoder": null
        },
        "heads": {
            "single_instance": null,
            "centroid": null,
            "centered_instance": {
                "anchor_part": "pedicel",
                "part_names": null,
                "sigma": 2.5,
                "output_stride": 4,
                "loss_weight": 1.0,
                "offset_refinement": false
            },
            "multi_instance": null,
            "multi_class_bottomup": null,
            "multi_class_topdown": null
        }
    },
    "optimization": {
        "preload_data": true,
        "augmentation_config": {
            "rotate": true,
            "rotation_min_angle": -180.0,
            "rotation_max_angle": 180.0,
            "translate": false,
            "translate_min": -5,
            "translate_max": 5,
            "scale": true,
            "scale_min": 0.9,
            "scale_max": 1.1,
            "uniform_noise": false,
            "uniform_noise_min_val": 0.0,
            "uniform_noise_max_val": 10.0,
            "gaussian_noise": false,
            "gaussian_noise_mean": 5.0,
            "gaussian_noise_stddev": 1.0,
            "contrast": false,
            "contrast_min_gamma": 0.5,
            "contrast_max_gamma": 2.0,
            "brightness": true,
            "brightness_min_val": 0.0,
            "brightness_max_val": 10.0,
            "random_crop": false,
            "random_crop_height": 256,
            "random_crop_width": 256,
            "random_flip": false,
            "flip_horizontal": true
        },
        "online_shuffling": true,
        "shuffle_buffer_size": 128,
        "prefetch": true,
        "batch_size": 4,
        "batches_per_epoch": null,
        "min_batches_per_epoch": 200,
        "val_batches_per_epoch": null,
        "min_val_batches_per_epoch": 10,
        "epochs": 200,
        "optimizer": "adam",
        "initial_learning_rate": 0.0001,
        "learning_rate_schedule": {
            "reduce_on_plateau": true,
            "reduction_factor": 0.5,
            "plateau_min_delta": 1e-06,
            "plateau_patience": 5,
            "plateau_cooldown": 3,
            "min_learning_rate": 1e-08
        },
        "hard_keypoint_mining": {
            "online_mining": true,
            "hard_to_easy_ratio": 2.0,
            "min_hard_keypoints": 3,
            "max_hard_keypoints": null,
            "loss_scale": 5.0
        },
        "early_stopping": {
            "stop_training_on_plateau": true,
            "plateau_min_delta": 1e-08,
            "plateau_patience": 10
        }
    },
    "outputs": {
        "save_outputs": true,
        "run_name": "230220_173034",
        "run_name_prefix": "",
        "run_name_suffix": ".centered_instance",
        "runs_folder": "models",
        "tags": [
            ""
        ],
        "save_visualizations": true,
        "delete_viz_images": true,
        "zip_outputs": false,
        "log_to_csv": true,
        "checkpointing": {
            "initial_model": false,
            "best_model": true,
            "every_epoch": false,
            "latest_model": false,
            "final_model": false
        },
        "tensorboard": {
            "write_logs": false,
            "loss_frequency": "epoch",
            "architecture_graph": false,
            "profile_graph": false,
            "visualizations": true
        },
        "zmq": {
            "subscribe_to_controller": false,
            "controller_address": "tcp://127.0.0.1:9000",
            "controller_polling_timeout": 10,
            "publish_updates": false,
            "publish_address": "tcp://127.0.0.1:9001"
        }
    },
    "name": "",
    "description": "",
    "sleap_version": "1.2.9",
    "filename": "centered_instance.json"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Auto-selected GPU 0 with 40533 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 271 / Validation = 30.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2023-02-20 23:13:00.691758: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
INFO:sleap.nn.training:Loaded test example. [3.366s]
INFO:sleap.nn.training:  Input shape: (544, 544, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=48, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=5, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 32
INFO:sleap.nn.training:  Parameters: 70,331,019
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CenteredInstanceConfmapsHead(part_names=['prosoma', 'pedicel', 'opisthosoma', 'pedipalpR1', 'pedipalpL1', 'antlegR1', 'antlegR2', 'antlegL1', 'antlegL2', 'forelegR1', 'forelegR2', 'forelegL1', 'forelegL2', 'midlegR1', 'midlegR2', 'midlegL1', 'midlegL2', 'hindlegR1', 'hindlegR2', 'hindlegL1', 'hindlegL2', 'pedipalpR2', 'pedipalpL2', 'antlegR3', 'antlegR4', 'antlegL3', 'antlegL4'], anchor_part='pedicel', sigma=2.5, output_stride=4, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 136, 136, 27), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 271
INFO:sleap.nn.training:Validation set: n = 30
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=3, max_hard_keypoints=None, loss_scale=5.0)
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/230220_173034.centered_instance
INFO:sleap.nn.training:Setting up visualization...
2023-02-20 23:13:02.432647: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:13:03.627076: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [6.2s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
2023-02-20 23:13:15.674809: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:13:18.846102: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
INFO:sleap.nn.training:Finished creating training datasets. [15.6s]
INFO:sleap.nn.training:Starting training loop...
2023-02-20 23:13:19.593550: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
Epoch 1/200
2023-02-20 23:13:56.422111: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:14:02.495095: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 136 } dim { size: 136 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }
200/200 - 46s - loss: 0.0064 - ohkm: 0.0053 - prosoma: 0.0010 - pedicel: 0.0010 - opisthosoma: 0.0011 - pedipalpR1: 0.0010 - pedipalpL1: 0.0010 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0010 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0010 - midlegR2: 0.0011 - midlegL1: 0.0010 - midlegL2: 0.0011 - hindlegR1: 0.0010 - hindlegR2: 0.0011 - hindlegL1: 0.0010 - hindlegL2: 0.0011 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0010 - antlegR4: 9.9843e-04 - antlegL3: 0.0011 - antlegL4: 0.0010 - val_loss: 0.0063 - val_ohkm: 0.0053 - val_prosoma: 0.0010 - val_pedicel: 9.7641e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0010 - val_pedipalpL1: 0.0010 - val_antlegR1: 0.0010 - val_antlegR2: 0.0011 - val_antlegL1: 0.0010 - val_antlegL2: 0.0011 - val_forelegR1: 0.0010 - val_forelegR2: 0.0011 - val_forelegL1: 0.0010 - val_forelegL2: 0.0011 - val_midlegR1: 0.0010 - val_midlegR2: 0.0011 - val_midlegL1: 0.0010 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0010 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0010 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0011 - val_antlegR4: 9.6961e-04 - val_antlegL3: 0.0011 - val_antlegL4: 9.9553e-04 - lr: 1.0000e-04 - 46s/epoch - 232ms/step
Epoch 2/200
2023-02-20 23:14:33.058386: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
200/200 - 39s - loss: 0.0063 - ohkm: 0.0053 - prosoma: 9.8153e-04 - pedicel: 9.6689e-04 - opisthosoma: 0.0010 - pedipalpR1: 9.9786e-04 - pedipalpL1: 9.9719e-04 - antlegR1: 0.0010 - antlegR2: 0.0010 - antlegL1: 0.0010 - antlegL2: 0.0010 - forelegR1: 0.0010 - forelegR2: 0.0010 - forelegL1: 0.0010 - forelegL2: 0.0010 - midlegR1: 0.0010 - midlegR2: 0.0010 - midlegL1: 0.0010 - midlegL2: 0.0010 - hindlegR1: 0.0010 - hindlegR2: 0.0010 - hindlegL1: 0.0010 - hindlegL2: 0.0010 - pedipalpR2: 0.0010 - pedipalpL2: 0.0010 - antlegR3: 0.0010 - antlegR4: 9.8585e-04 - antlegL3: 0.0010 - antlegL4: 0.0010 - val_loss: 0.0063 - val_ohkm: 0.0052 - val_prosoma: 9.7833e-04 - val_pedicel: 9.5453e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0010 - val_pedipalpL1: 0.0010 - val_antlegR1: 0.0010 - val_antlegR2: 0.0010 - val_antlegL1: 0.0010 - val_antlegL2: 0.0010 - val_forelegR1: 0.0010 - val_forelegR2: 0.0010 - val_forelegL1: 0.0010 - val_forelegL2: 0.0010 - val_midlegR1: 0.0010 - val_midlegR2: 0.0010 - val_midlegL1: 0.0010 - val_midlegL2: 0.0010 - val_hindlegR1: 0.0010 - val_hindlegR2: 0.0010 - val_hindlegL1: 0.0010 - val_hindlegL2: 0.0010 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0010 - val_antlegR4: 9.8150e-04 - val_antlegL3: 0.0010 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 39s/epoch - 195ms/step

...until I force stopped the process. I appreciate any help you can provide.

roomrys commented 1 year ago

Hi @amblypatty,

Originally, we thought this error might be caused by the plotting just the visualizations (confidence maps overlaid on instances) during training; however, after tracking down the error, we found that the real problem was that our pipeline for the top-down model is not set-up to handle input scaling on the second model (the centered instance model). It seems your input_scaling is set to the default 1.0 so we don't expect to see this particular error in your case.

Unless I overlooked something, the logs seem to indicate that training has completed the 2nd epoch and is about to head into the 3rd epoch? Some clarifying questions: Are the logs truncated? What behavior are you experiencing?

Thanks, Liezl

amblypatty commented 1 year ago

Hi @roomrys,

Indeed, I terminated the process after seeing the PredictCost() function failed in the first epoch:

Epoch 1/200
2023-02-20 23:13:56.422111: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:14:02.495095: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 136 } dim { size: 136 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }

The previous result after training the top-down model with this error in each epoch (though, the warning shows up earlier) was a predictions.pkg.slp file with 'mean scores' but no instances on the suggested frames when I run:

!sleap-track \
    -m models/230218_232711.centroid \
    -m models/230218_232711.centered_instance \
    --only-suggested-frames \
    -o 230218_232711_predicted_suggestions.slp \
    resolved_skeletons_with_predictions.pkg.slp

Where I get a complete prediction (with PredictCost() errors):

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Started inference at: 2023-02-19 05:15:02.414019
Args:
{
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'models': [
│   │   'models/230218_232711.centroid',
│   │   'models/230218_232711.centered_instance'
│   ],
│   'frames': '',
│   'only_labeled_frames': False,
│   'only_suggested_frames': True,
│   'output': '230218_232711_predicted_suggestions.slp',
│   'no_empty_frames': False,
│   'verbosity': 'rich',
│   'video.dataset': None,
│   'video.input_format': 'channels_last',
│   'video.index': '',
│   'cpu': False,
│   'first_gpu': False,
│   'last_gpu': False,
│   'gpu': 'auto',
│   'max_edge_length_ratio': 0.25,
│   'dist_penalty_weight': 1.0,
│   'batch_size': 4,
│   'open_in_gui': False,
│   'peak_threshold': 0.2,
│   'tracking.tracker': None,
│   'tracking.target_instance_count': None,
│   'tracking.pre_cull_to_target': None,
│   'tracking.pre_cull_iou_threshold': None,
│   'tracking.post_connect_single_breaks': None,
│   'tracking.clean_instance_count': None,
│   'tracking.clean_iou_threshold': None,
│   'tracking.similarity': None,
│   'tracking.match': None,
│   'tracking.track_window': None,
│   'tracking.min_new_track_points': None,
│   'tracking.min_match_points': None,
│   'tracking.img_scale': None,
│   'tracking.of_window_size': None,
│   'tracking.of_max_levels': None,
│   'tracking.save_shifted_instances': None,
│   'tracking.kf_node_indices': None,
│   'tracking.kf_init_frame_count': None
}

INFO:sleap.nn.inference:Auto-selected GPU 0 with 40533 MiB of free memory.
Versions:
SLEAP: 1.2.9
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.8.10
OS: Linux-5.10.147+-x86_64-with-glibc2.29

System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True

Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-02-19 05:15:29.605879: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -51 } dim { size: -52 } dim { size: -53 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -54 } dim { size: -55 } dim { size: 1 } } }
2023-02-19 05:15:29.606433: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 3 } } }
2023-02-19 05:15:29.613101: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -109 } dim { size: -110 } dim { size: -111 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━  94% ETA: 0:00:01 58.1 FPS2023-02-19 05:15:35.958628: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -51 } dim { size: -52 } dim { size: -53 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -54 } dim { size: -55 } dim { size: 1 } } }
2023-02-19 05:15:35.959166: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 3 } } }
2023-02-19 05:15:35.966002: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -109 } dim { size: -110 } dim { size: -111 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 13.9 FPS
Finished inference at: 2023-02-19 05:15:37.534322
Total runtime: 35.120323181152344 secs
Predicted frames: 51/51
Provenance:
{
│   'model_paths': [
│   │   'models/230218_232711.centroid/training_config.json',
│   │   'models/230218_232711.centered_instance/training_config.json'
│   ],
│   'predictor': 'TopDownPredictor',
│   'sleap_version': '1.2.9',
│   'platform': 'Linux-5.10.147+-x86_64-with-glibc2.29',
│   'command': '/usr/local/bin/sleap-track -m models/230218_232711.centroid -m models/230218_232711.centered_instance --only-suggested-frames -o 230218_232711_predicted_suggestions.slp resolved_skeletons_with_predictions.pkg.slp',
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'output_path': '230218_232711_predicted_suggestions.slp',
│   'total_elapsed': 35.120323181152344,
│   'start_timestamp': '2023-02-19 05:15:02.414019',
│   'finish_timestamp': '2023-02-19 05:15:37.534322'
}

Saved output: 230218_232711_predicted_suggestions.slp

...and then merge the predictions in the SLEAP GUI. Additionally, there are no metrics for the centered_instance model: image

The image above shows, in the background, a suggested frame (313) that has a mean score but there is no predicted instance on the frame. In the foreground shows the evaluation metrics window where the most recent centered_instance model shows empty cells for the evaluation metrics, but the previous centered_instance model shows the metrics (expected).

Thanks for your help, Patrick

amblypatty commented 1 year ago

Hello @roomrys and @talmo,

I am still experiencing this issue, even in the newest 1.3.0a0 release. I have tried redoing this with a few different hyperparameters to try and get the previously expected behavior, but I am still experiencing an error in the PredictCost() function. I am afraid I don't really know what it means or how to get around it. I would really appreciate some help on this one.

Here is the latest output from my top-down training, first from the Centroid and then the Centered-Instance:

INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 271 / Validation = 30.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
INFO:sleap.nn.training:Loaded test example. [2.734s]
INFO:sleap.nn.training:  Input shape: (544, 960, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 1,953,393
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CentroidConfmapsHead(anchor_part='pedicel', sigma=2.5, output_stride=2, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 272, 480, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 271
INFO:sleap.nn.training:Validation set: n = 30
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/230312_144956.centroid
INFO:sleap.nn.training:Setting up visualization...
2023-03-12 19:06:42.229412: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
2023-03-12 19:06:43.501136: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [6.9s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [14.2s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/200
200/200 - 38s - loss: 9.3239e-05 - val_loss: 5.8659e-05 - lr: 1.0000e-04 - 38s/epoch - 190ms/step
Epoch 2/200
200/200 - 22s - loss: 3.1217e-05 - val_loss: 2.7985e-05 - lr: 1.0000e-04 - 22s/epoch - 111ms/step
Epoch 3/200
200/200 - 23s - loss: 1.8750e-05 - val_loss: 1.7997e-05 - lr: 1.0000e-04 - 23s/epoch - 113ms/step

... (truncated here as training ensues the same) ...

Epoch 46: ReduceLROnPlateau reducing learning rate to 3.12499992105586e-06.
200/200 - 21s - loss: 2.8520e-06 - val_loss: 4.5091e-06 - lr: 6.2500e-06 - 21s/epoch - 107ms/step
Epoch 47/200
200/200 - 22s - loss: 3.1557e-06 - val_loss: 1.9466e-06 - lr: 3.1250e-06 - 22s/epoch - 108ms/step
Epoch 47: early stopping
INFO:sleap.nn.training:Finished training loop. [17.6 min]
INFO:sleap.nn.training:Deleting visualization directory: models/230312_144956.centroid/viz
INFO:sleap.nn.training:Saving evaluation metrics to model folder...
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:24:35.575939: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:35.576331: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸  99% ETA: 0:00:01 27.9 FPS2023-03-12 19:24:45.964082: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:45.964449: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 15.8 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centroid/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centroid/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.980198
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:24:49.825924: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:49.826309: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━  93% ETA: 0:00:01 88.9 FPS2023-03-12 19:24:51.899967: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:51.900339: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 10.7 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centroid/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centroid/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.930693
INFO:sleap.nn.training:
INFO:sleap.nn.training:Auto-selected GPU 0 with 40510 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 271 / Validation = 30.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2023-03-12 19:25:06.978157: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
INFO:sleap.nn.training:Loaded test example. [3.324s]
INFO:sleap.nn.training:  Input shape: (512, 512, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=24, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=2, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 4,313,235
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CenteredInstanceConfmapsHead(part_names=['prosoma', 'pedicel', 'opisthosoma', 'pedipalpR1', 'pedipalpL1', 'antlegR1', 'antlegR2', 'antlegL1', 'antlegL2', 'forelegR1', 'forelegR2', 'forelegL1', 'forelegL2', 'midlegR1', 'midlegR2', 'midlegL1', 'midlegL2', 'hindlegR1', 'hindlegR2', 'hindlegL1', 'hindlegL2', 'pedipalpR2', 'pedipalpL2', 'antlegR3', 'antlegR4', 'antlegL3', 'antlegL4'], anchor_part='pedicel', sigma=2.5, output_stride=4, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 128, 128, 27), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 271
INFO:sleap.nn.training:Validation set: n = 30
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=3, max_hard_keypoints=None, loss_scale=5.0)
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/230312_144956.centered_instance
INFO:sleap.nn.training:Setting up visualization...
2023-03-12 19:25:08.628362: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:09.832672: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [6.1s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
2023-03-12 19:25:21.965479: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:25.098796: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
INFO:sleap.nn.training:Finished creating training datasets. [15.6s]
INFO:sleap.nn.training:Starting training loop...
2023-03-12 19:25:25.834120: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
Epoch 1/200
2023-03-12 19:25:51.413321: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:56.975111: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 128 } dim { size: 128 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }
200/200 - 32s - loss: 0.0071 - ohkm: 0.0060 - prosoma: 0.0012 - pedicel: 0.0012 - opisthosoma: 0.0012 - pedipalpR1: 0.0012 - pedipalpL1: 0.0012 - antlegR1: 0.0012 - antlegR2: 0.0012 - antlegL1: 0.0012 - antlegL2: 0.0012 - forelegR1: 0.0012 - forelegR2: 0.0012 - forelegL1: 0.0012 - forelegL2: 0.0012 - midlegR1: 0.0012 - midlegR2: 0.0012 - midlegL1: 0.0012 - midlegL2: 0.0012 - hindlegR1: 0.0012 - hindlegR2: 0.0012 - hindlegL1: 0.0012 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0071 - val_ohkm: 0.0060 - val_prosoma: 0.0011 - val_pedicel: 0.0012 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0012 - val_antlegR2: 0.0012 - val_antlegL1: 0.0011 - val_antlegL2: 0.0012 - val_forelegR1: 0.0012 - val_forelegR2: 0.0012 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0012 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0012 - val_antlegR4: 0.0011 - val_antlegL3: 0.0012 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 32s/epoch - 160ms/step
Epoch 2/200
2023-03-12 19:26:14.838946: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 20s - loss: 0.0071 - ohkm: 0.0060 - prosoma: 0.0012 - pedicel: 0.0012 - opisthosoma: 0.0012 - pedipalpR1: 0.0012 - pedipalpL1: 0.0012 - antlegR1: 0.0012 - antlegR2: 0.0012 - antlegL1: 0.0012 - antlegL2: 0.0012 - forelegR1: 0.0012 - forelegR2: 0.0012 - forelegL1: 0.0012 - forelegL2: 0.0012 - midlegR1: 0.0012 - midlegR2: 0.0012 - midlegL1: 0.0012 - midlegL2: 0.0012 - hindlegR1: 0.0012 - hindlegR2: 0.0012 - hindlegL1: 0.0012 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0071 - val_ohkm: 0.0060 - val_prosoma: 0.0012 - val_pedicel: 0.0012 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0012 - val_antlegR2: 0.0012 - val_antlegL1: 0.0012 - val_antlegL2: 0.0012 - val_forelegR1: 0.0012 - val_forelegR2: 0.0012 - val_forelegL1: 0.0012 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0012 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0012 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0011 - val_antlegR4: 9.8390e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 20s/epoch - 98ms/step
Epoch 3/200
2023-03-12 19:26:35.598997: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 22s - loss: 0.0071 - ohkm: 0.0059 - prosoma: 0.0011 - pedicel: 0.0011 - opisthosoma: 0.0012 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0012 - antlegL1: 0.0011 - antlegL2: 0.0012 - forelegR1: 0.0011 - forelegR2: 0.0012 - forelegL1: 0.0011 - forelegL2: 0.0012 - midlegR1: 0.0011 - midlegR2: 0.0012 - midlegL1: 0.0011 - midlegL2: 0.0012 - hindlegR1: 0.0011 - hindlegR2: 0.0012 - hindlegL1: 0.0011 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0070 - val_ohkm: 0.0059 - val_prosoma: 0.0011 - val_pedicel: 0.0011 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0012 - val_antlegL1: 0.0011 - val_antlegL2: 0.0012 - val_forelegR1: 0.0011 - val_forelegR2: 0.0012 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0011 - val_antlegR4: 9.6768e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 22s/epoch - 110ms/step
Epoch 4/200
2023-03-12 19:26:56.726952: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0070 - ohkm: 0.0059 - prosoma: 0.0011 - pedicel: 0.0011 - opisthosoma: 0.0012 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0012 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0012 - forelegL1: 0.0011 - forelegL2: 0.0012 - midlegR1: 0.0011 - midlegR2: 0.0012 - midlegL1: 0.0011 - midlegL2: 0.0012 - hindlegR1: 0.0011 - hindlegR2: 0.0012 - hindlegL1: 0.0011 - hindlegL2: 0.0012 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0070 - val_ohkm: 0.0058 - val_prosoma: 0.0011 - val_pedicel: 0.0011 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0011 - val_pedipalpL2: 0.0011 - val_antlegR3: 0.0011 - val_antlegR4: 9.7633e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 21s/epoch - 107ms/step
Epoch 5/200
2023-03-12 19:27:18.484206: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0069 - ohkm: 0.0058 - prosoma: 0.0011 - pedicel: 0.0010 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0069 - val_ohkm: 0.0058 - val_prosoma: 0.0011 - val_pedicel: 0.0010 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0011 - val_pedipalpL2: 0.0011 - val_antlegR3: 0.0011 - val_antlegR4: 9.5325e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 21s/epoch - 107ms/step
Epoch 6/200
2023-03-12 19:27:39.438895: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0068 - ohkm: 0.0058 - prosoma: 9.7589e-04 - pedicel: 8.8568e-04 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 0.0010 - pedipalpL2: 0.0010 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0068 - val_ohkm: 0.0057 - val_prosoma: 9.5023e-04 - val_pedicel: 8.6653e-04 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0011 - val_antlegR4: 9.0389e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 21s/epoch - 104ms/step
Epoch 7/200
2023-03-12 19:28:00.736129: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 23s - loss: 0.0068 - ohkm: 0.0057 - prosoma: 9.0675e-04 - pedicel: 8.2878e-04 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 9.7089e-04 - pedipalpL2: 9.6632e-04 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0067 - val_ohkm: 0.0057 - val_prosoma: 8.2120e-04 - val_pedicel: 7.7487e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 8.7347e-04 - val_pedipalpL2: 9.0317e-04 - val_antlegR3: 0.0011 - val_antlegR4: 9.1410e-04 - val_antlegL3: 0.0010 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 23s/epoch - 113ms/step

... Truncated through the rest of the training epochs. Notice the PredictCost() error warning each time...

Epoch 49/200
2023-03-12 19:42:29.306173: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 20s - loss: 0.0031 - ohkm: 0.0027 - prosoma: 3.3843e-04 - pedicel: 3.0595e-04 - opisthosoma: 2.9900e-04 - pedipalpR1: 3.7595e-04 - pedipalpL1: 3.7146e-04 - antlegR1: 4.9511e-04 - antlegR2: 4.9294e-04 - antlegL1: 4.7181e-04 - antlegL2: 4.0852e-04 - forelegR1: 3.7959e-04 - forelegR2: 4.7230e-04 - forelegL1: 3.6725e-04 - forelegL2: 4.3950e-04 - midlegR1: 3.5101e-04 - midlegR2: 4.2124e-04 - midlegL1: 3.4595e-04 - midlegL2: 4.2020e-04 - hindlegR1: 3.6612e-04 - hindlegR2: 3.3246e-04 - hindlegL1: 3.7615e-04 - hindlegL2: 3.2629e-04 - pedipalpR2: 3.7594e-04 - pedipalpL2: 3.8471e-04 - antlegR3: 5.9053e-04 - antlegR4: 6.0350e-04 - antlegL3: 5.1578e-04 - antlegL4: 5.2723e-04 - val_loss: 0.0040 - val_ohkm: 0.0035 - val_prosoma: 4.2798e-04 - val_pedicel: 3.8819e-04 - val_opisthosoma: 3.6716e-04 - val_pedipalpR1: 4.5729e-04 - val_pedipalpL1: 4.7555e-04 - val_antlegR1: 5.9989e-04 - val_antlegR2: 6.6770e-04 - val_antlegL1: 5.8266e-04 - val_antlegL2: 4.6682e-04 - val_forelegR1: 4.9084e-04 - val_forelegR2: 5.7035e-04 - val_forelegL1: 4.2677e-04 - val_forelegL2: 5.4642e-04 - val_midlegR1: 4.6823e-04 - val_midlegR2: 5.3738e-04 - val_midlegL1: 4.1209e-04 - val_midlegL2: 5.5408e-04 - val_hindlegR1: 4.5810e-04 - val_hindlegR2: 5.1612e-04 - val_hindlegL1: 4.6348e-04 - val_hindlegL2: 4.0936e-04 - val_pedipalpR2: 4.6826e-04 - val_pedipalpL2: 4.9571e-04 - val_antlegR3: 7.8056e-04 - val_antlegR4: 7.7968e-04 - val_antlegL3: 5.9276e-04 - val_antlegL4: 6.4306e-04 - lr: 1.2500e-05 - 20s/epoch - 101ms/step
Epoch 49: early stopping
INFO:sleap.nn.training:Finished training loop. [17.1 min]
INFO:sleap.nn.training:Deleting visualization directory: models/230312_144956.centered_instance/viz
INFO:sleap.nn.training:Saving evaluation metrics to model folder...
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:42:35.583321: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:35.592771: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸  99% ETA: 0:00:01 26.9 FPS2023-03-12 19:42:45.387926: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:45.397145: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 19.4 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centered_instance/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centered_instance/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.870044
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:42:48.149569: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:48.158952: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━  93% ETA: 0:00:01 92.7 FPS2023-03-12 19:42:49.475717: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:49.485143: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 16.0 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centered_instance/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centered_instance/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.830889

You will notice that there still is a metrics evaluation but with PredictCost() errors. I then predict on the suggested frames:

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Started inference at: 2023-03-12 20:33:42.799078
Args:
{
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'models': [
│   │   'models/230312_144956.centroid',
│   │   'models/230312_144956.centered_instance'
│   ],
│   'frames': '',
│   'only_labeled_frames': False,
│   'only_suggested_frames': True,
│   'output': '230312_144956_predicted_suggestions.slp',
│   'no_empty_frames': False,
│   'verbosity': 'rich',
│   'video.dataset': None,
│   'video.input_format': 'channels_last',
│   'video.index': '',
│   'cpu': False,
│   'first_gpu': False,
│   'last_gpu': False,
│   'gpu': 'auto',
│   'max_edge_length_ratio': 0.25,
│   'dist_penalty_weight': 1.0,
│   'batch_size': 4,
│   'open_in_gui': False,
│   'peak_threshold': 0.2,
│   'tracking.tracker': None,
│   'tracking.target_instance_count': None,
│   'tracking.pre_cull_to_target': None,
│   'tracking.pre_cull_iou_threshold': None,
│   'tracking.post_connect_single_breaks': None,
│   'tracking.clean_instance_count': None,
│   'tracking.clean_iou_threshold': None,
│   'tracking.similarity': None,
│   'tracking.match': None,
│   'tracking.robust': None,
│   'tracking.track_window': None,
│   'tracking.min_new_track_points': None,
│   'tracking.min_match_points': None,
│   'tracking.img_scale': None,
│   'tracking.of_window_size': None,
│   'tracking.of_max_levels': None,
│   'tracking.save_shifted_instances': None,
│   'tracking.kf_node_indices': None,
│   'tracking.kf_init_frame_count': None
}

INFO:sleap.nn.inference:Auto-selected GPU 0 with 40510 MiB of free memory.
Versions:
SLEAP: 1.3.0a0
TensorFlow: 2.8.4
Numpy: 1.22.4
Python: 3.9.16
OS: Linux-5.10.147+-x86_64-with-glibc2.31

System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True

Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 20:33:56.497439: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -56 } dim { size: -57 } dim { size: -58 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 1 } } }
2023-03-12 20:33:56.497985: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -67 } dim { size: -68 } dim { size: 3 } } }
2023-03-12 20:33:56.503357: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -111 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -114 } dim { size: -115 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━  94% ETA: 0:00:01 76.1 FPS2023-03-12 20:34:01.203800: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -56 } dim { size: -57 } dim { size: -58 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 1 } } }
2023-03-12 20:34:01.204359: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -67 } dim { size: -68 } dim { size: 3 } } }
2023-03-12 20:34:01.209752: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -111 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -114 } dim { size: -115 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 18.7 FPS
Finished inference at: 2023-03-12 20:34:02.183675
Total runtime: 19.384610176086426 secs
Predicted frames: 51/51
Provenance:
{
│   'model_paths': [
│   │   'models/230312_144956.centroid/training_config.json',
│   │   'models/230312_144956.centered_instance/training_config.json'
│   ],
│   'predictor': 'TopDownPredictor',
│   'sleap_version': '1.3.0a0',
│   'platform': 'Linux-5.10.147+-x86_64-with-glibc2.31',
│   'command': '/usr/local/bin/sleap-track -m models/230312_144956.centroid -m models/230312_144956.centered_instance --only-suggested-frames -o 230312_144956_predicted_suggestions.slp resolved_skeletons_with_predictions.pkg.slp',
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'output_path': '230312_144956_predicted_suggestions.slp',
│   'total_elapsed': 19.384610176086426,
│   'start_timestamp': '2023-03-12 20:33:42.799078',
│   'finish_timestamp': '2023-03-12 20:34:02.183675'
}

Saved output: 230312_144956_predicted_suggestions.slp

But the problem is that the prediction file is empty. Even though it has the same file size (57kb) as previous prediction files that have worked. When I merge the prediction file into my current SLEAP project, nothing happens. When I open the prediction file by itself, nothing shows up either, but it could be because there isn't a video file attached to it.

Additionally, as in my previous comment, I am still unable to see a model metric evaluation. Please let me know if there is something else I can provide to help solve this issue. I am stuck until this is solved.

roomrys commented 1 year ago

Hi @amblypatty,

Could you share everything needed to do the training/inference (video, slp, models) and the 230312_144956_predicted_suggestions.slp to lmaree@salk.edu? Sorry, Github doesn't notify for reactions, but thanks for bumping this again - it had gotten buried.... Let's get you unstuck.

Thanks, Liezl

jmdelahanty commented 1 year ago

One of our labmates is also seeming to experience this issue. I can send to you if you want an example Liezl, but they're currently running an older SLEAP version.

Lauraschwarz commented 1 year ago

i think i am experiencing a similar issue. i am very new to this but reading through this it seems very similar to what happens for me. i have tried optimising the training parameters for my top-down multianimal model, and when i tweak the input scaling (and the max stride) settings, in some cases i receive an error message in the GUI saying that the training failed. for my centroid model, keeping the input scaling at 0.5 and the max stride at 32 works. but increasing the input scaling to 1.0 and the max stride to 64 i start seeing this issue. i will keep an eye on this issue. i just thought i would mention that i am experiencing this. thank you also for an amazing tool. i really like SLEAP.

smasri09 commented 1 year ago

Hello, I am getting this issue as well, but at input scaling of 0.5. I need to use 0.5 to get the model to run on my 8GB GPU with 1280x1024 video, by changing that and by reducing filters from 64 to 48, and rate from 2 to 1.5, I was finally able to get the model to run. Attached error code. Is there anything I can do? Thanks for the support

} INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.5, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": "back", "crop_size": 592, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 48, "filters_rate": 1.5, "middle_block": true, "up_interpolate": false, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": { "confmaps": { "anchor_part": "back", "part_names": null, "sigma": 5.0, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "class_vectors": { "classes": [ "o", "d" ], "num_fc_layers": 3, "num_fc_units": 64, "global_pool": true, "output_stride": 16, "loss_weight": 1.0 } } }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": false, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 8, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 100, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-06, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "231103_162437.multi_class_topdown.n=20", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/ml/sleap/labels\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.3", "filename": "C:\Users\smasr\AppData\Local\Temp\tmpb8i55rmq\231103_162437_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 7963 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/ml/sleap/labels/labels.v001.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 18 / Validation = 2. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2023-11-03 16:24:41.318359: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-03 16:24:41.699298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5417 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9 INFO:sleap.nn.training:Loaded test example. [2.027s] INFO:sleap.nn.training: Input shape: (592, 592, 3) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=48, filters_rate=1.5, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=False, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 3,326,096 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'neck', 'back', 'tailstart', 'tailend'], anchor_part='back', sigma=5.0, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: [1] = ClassVectorsHead(classes=['o', 'd'], num_fc_layers=3, num_fc_units=64, global_pool=True, output_stride=16, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 296, 296, 5), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") INFO:sleap.nn.training: [1] = KerasTensor(type_spec=TensorSpec(shape=(None, 2), dtype=tf.float32, name=None), name='ClassVectorsHead/Softmax:0', description="created by layer 'ClassVectorsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 18 INFO:sleap.nn.training:Validation set: n = 2 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-06, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/ml/sleap/labels\models\231103_162437.multi_class_topdown.n=20 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [3.3s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [3.2s] INFO:sleap.nn.training:Starting training loop... Epoch 1/100 2023-11-03 16:24:50.027369: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201 2023-11-03 16:24:51.105728: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: ptxas exited with non-zero error code -1, output: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2023-11-03 16:24:54.369871: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.43GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:54.370160: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.43GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:56.097067: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. 2023-11-03 16:24:56.987300: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:56.987450: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.053966: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.054198: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.415488: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.416283: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.829054: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.829268: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. WARNING:tensorflow:Callback method on_train_batch_end is slow compared to the batch time (batch time: 0.1977s vs on_train_batch_end time: 0.2514s). Check your callbacks. Traceback (most recent call last): File "C:\Users\smasr.conda\envs\das\envs\sleap2\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')()) File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 2014, in main trainer.train() File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 941, in train verbose=2, File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end figure = self.plot_fn() File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 1786, in viz_fn=lambda: visualize_example(next(training_viz_ds_iter)), File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 1766, in visualize_example preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0)) File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\inference.py", line 2088, in call out = self.keras_model(crops) ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 592, 592, 3), found shape=(1, 296, 296, 3)

roomrys commented 1 year ago

Hi @smasri09,

I know you said that you needed an input scaling of 0.5 to get the model to run on you 8GB GPU, but is there any way you can keep top-down-id model to an input scaling of 1 and just adjust the centroid model input scaling? Maybe even lowering it less than 0.5? Similar to the centered instance model, the top-down-id model does not support adjusting the input scaling - it relies on the centroid model taking crops of the full image to save on memory, but then keeps full resolution in the crop to accurately locate smaller body parts.

Thanks, Liezl

aperkes commented 8 months ago

HI, I ran into the same issue training a leap-backbone top-down centered instance model with input scaling at 0.25 (it was just there by default, or maybe from an earlier run). Google brought me here and switching input scaling 1 fixed the problem. Here's the log in case that is helpful:

Output Log:
Using already trained model for centroid: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_132820.centroid.n=20/training_config.json Resetting monitor window. Polling: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_145208.centered_instance.n=20/viz/validation.*.png Start training centered_instance... ['sleap-train', '/tmp/tmpvfx421d7/240222_145208_training_job.json', '/home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.3 TensorFlow: 2.7.0 Numpy: 1.19.5 Python: 3.7.12 OS: Linux-5.15.0-94-generic-x86_64-with-debian-bullseye-sid INFO:sleap.nn.training:Training labels file: /home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp INFO:sleap.nn.training:Training profile: /tmp/tmpvfx421d7/240222_145208_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "/tmp/tmpvfx421d7/240222_145208_training_job.json", "labels_path": "/home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": true, "imagenet_mode": null, "input_scaling": 0.25, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": "Body-line", "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": { "max_stride": 8, "output_stride": 4, "filters": 64, "filters_rate": 2.0, "up_interpolate": false, "stacks": 1 }, "unet": null, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": { "anchor_part": "Body-line", "part_names": null, "sigma": 2.5, "output_stride": 4, "loss_weight": 1.0, "offset_refinement": false }, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": true, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": true, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 8, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "240222_145208.centered_instance.n=20", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "/home/ammon/Documents/Scripts/FishTrack/sleap/models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.3", "filename": "/tmp/tmpvfx421d7/240222_145208_training_job.json" } INFO:sleap.nn.training: 2024-02-22 14:52:10.597671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:10.603512: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:10.603672: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero INFO:sleap.nn.training:Auto-selected GPU 0 with 11038 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: /home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 18 / Validation = 2. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2024-02-22 14:52:11.407698: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-22 14:52:11.408510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:11.408692: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:11.408806: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:11.733932: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:11.734090: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:11.734232: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2024-02-22 14:52:11.734326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9233 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6 INFO:sleap.nn.training:Loaded test example. [1.962s] INFO:sleap.nn.training: Input shape: (32, 32, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: LeapCNN(stacks=1, filters=64, filters_rate=2.0, down_blocks=3, down_convs_per_block=3, up_blocks=1, up_interpolate=False, up_convs_per_block=2) INFO:sleap.nn.training: Max stride: 8 INFO:sleap.nn.training: Parameters: 2,509,443 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['Mouth', 'Body-line', 'Tail-tip'], anchor_part='Body-line', sigma=2.5, output_stride=4, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 8, 8, 3), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 18 INFO:sleap.nn.training:Validation set: n = 2 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_145208.centered_instance.n=20 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [3.4s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [3.0s] INFO:sleap.nn.training:Starting training loop... Epoch 1/200 2024-02-22 14:52:19.009446: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201 Traceback (most recent call last): File "/home/ammon/anaconda3/envs/sleap/bin/sleap-train", line 33, in sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')()) File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main trainer.train() File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 941, in train verbose=2, File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/callbacks.py", line 280, in on_epoch_end figure = self.plot_fn() File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1346, in viz_fn=lambda: visualize_example(next(training_viz_ds_iter)), File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1326, in visualize_example preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0)) File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2088, in call out = self.keras_model(crops) ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks). Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 32, 32, 1), found shape=(1, 8, 8, 1) Call arguments received: • inputs=tf.Tensor(shape=(1, 32, 32, 1), dtype=float32) terminate called without an active exception
lqmeyers commented 8 months ago

HI, I ran into the same issue training a leap-backbone top-down centered instance model with input scaling at 0.25 (it was just there by default, or maybe from an earlier run). Google brought me here and switching input scaling 1 fixed the problem. Here's the log in case that is helpful:

Output Log:

Hi! Just want to add I'm running into this issue as well, with updated SLEAP from conda. Assume it is being worked on, but in the mean time was curious what other params (other than batch size) to tweak to make centerd inst training smaller for our GPU limits.

Thanks!

Luke

Traceback (most recent call last): Traceback (most recent call last): File "/home/lmeyers/anaconda3/envs/sleap/bin/sleap-train", line 33, in sys.exit(load_entry_point('sleap==1.2.8', 'console_scripts', 'sleap-train')()) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1981, in main trainer.train() File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 927, in train verbose=2, File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/training.py", line 1230, in fit callbacks.on_epoch_end(epoch, epoch_logs) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/callbacks.py", line 413, in on_epoch_end callback.on_epoch_end(epoch, logs) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/callbacks.py", line 280, in on_epoch_end figure = self.plot_fn() File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1332, in viz_fn=lambda: visualize_example(next(training_viz_ds_iter)), File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1312, in visualize_example preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0)) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1037, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 1868, in call out = self.keras_model(crops) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1020, in __call__ input_spec.assert_input_compatibility(self.input_spec, inputs, self.name) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/input_spec.py", line 269, in assert_input_compatibility ', found shape=' + display_shape(x.shape)) ValueError: Input 0 is incompatible with layer model: expected shape=(None, 768, 768, 3), found shape=(1, 192, 192, 3) terminate called without an active exception train-script.sh: line 2: 32871 Aborted (core dumped) sleap-train centered_instance.json labels.v001.pkg.slp