No predictions #1821

Open agosztolai opened 3 months ago

agosztolai commented 3 months ago

Bug description

After training, the inference step outputs no labels.

Expected behaviour

I expect labels (yellow markers) to emerge in the GUI after inference.

Actual behaviour

Nothing happens.

Environment packages

Logs (sleap) adamgosztolai@Adams-MacBook-Pro-2 data % sleap-label Saving config: /Users/adamgosztolai/.sleap/1.3.3/preferences.yaml Restoring GUI state... Software versions: SLEAP: 1.3.3 TensorFlow: 2.9.2 Numpy: 1.22.4 Python: 3.9.15 OS: macOS-14.5-arm64-arm-64bit Happy SLEAPing! :) qt.qpa.fonts: Populating font family aliases took 155 ms. Replace uses of missing font family ".AppleSystemUIFont" with one that exists to avoid this cost. Resetting monitor window. Polling: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz/validation.*.png Start training single_instance... ['sleap-train', '/var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json', '/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.3 TensorFlow: 2.9.2 Numpy: 1.22.4 Python: 3.9.15 OS: macOS-14.5-arm64-arm-64bit INFO:sleap.nn.training:Training labels file: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp INFO:sleap.nn.training:Training profile: /var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "/var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json", "labels_path": "/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 1.0, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": { "part_names": null, "sigma": 2.5, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "centroid": null, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 1, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 2, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "240621_144245.single_instance.n=1", "run_name_prefix": "calibration", "run_name_suffix": "", "runs_folder": "/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://" } }, "name": "", "description": "", "sleap_version": "1.3.3", "filename": "/var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Failed to query GPU memory from nvidia-smi. Defaulting to first GPU. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 1 / Validation = 1. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... Metal device set to: Apple M3 Pro systemMemory: 36.00 GB maxCacheSize: 13.50 GB 2024-06-21 14:42:49.053848: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-06-21 14:42:49.053983: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2024-06-21 14:42:49.277816: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz INFO:sleap.nn.training:Loaded test example. [0.581s] INFO:sleap.nn.training: Input shape: (800, 1936, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,303 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = SingleInstanceConfmapsHead(part_names=['ruler_1', 'ruler_2', 'ruler_3', 'ruler_4', 'wand_1', 'wand_2', 'wand_3'], sigma=2.5, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 400, 968, 7), dtype=tf.float32, name=None), name='SingleInstanceConfmapsHead/BiasAdd:0', description="created by layer 'SingleInstanceConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 1 INFO:sleap.nn.training:Validation set: n = 1 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp:// (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp:// INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp:// for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp:// INFO:sleap.nn.training:Created run path: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [1.0s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [0.8s] INFO:sleap.nn.training:Starting training loop... Epoch 1/2 2024-06-21 14:42:51.186166: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:51.529986: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:52.342114: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:52.753515: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:52.759774: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 7 } dim { size: 400 } dim { size: 968 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } } 200/200 - 63s - loss: 5.0801e-05 - val_loss: 5.0457e-05 - lr: 1.0000e-04 - 63s/epoch - 313ms/step Epoch 2/2 Polling: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz/validation.*.png 200/200 - 61s - loss: 5.0256e-05 - val_loss: 4.9935e-05 - lr: 1.0000e-04 - 61s/epoch - 305ms/step INFO:sleap.nn.training:Finished training loop. [2.1 min] INFO:sleap.nn.training:Deleting visualization directory: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz INFO:sleap.nn.training:Saving evaluation metrics to model folder... Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?Polling: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz/validation.*.png 2024-06-21 14:44:55.050630: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:44:55.065402: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -28 } dim { size: -29 } dim { size: -30 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -12 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: -32 } dim { size: -33 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 ? /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:539: RuntimeWarning: Mean of empty slice "dist.avg": np.nanmean(dists), /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:572: RuntimeWarning: Mean of empty slice. mPCK = mPCK_parts.mean() /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:666: RuntimeWarning: Mean of empty slice. pair_pck = metrics["pck.pcks"].mean(axis=-1).mean(axis=-1) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:181: RuntimeWarning: invalid value encountered in true_divide ret = um.true_divide( /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:668: RuntimeWarning: Mean of empty slice. metrics["oks.mOKS"] = pair_oks.mean() WARNING:sleap.nn.evals:Failed to compute metrics. INFO:sleap.nn.evals:Saved predictions: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/labels_pr.train.slp Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2024-06-21 14:44:55.461039: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:44:55.476306: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -28 } dim { size: -29 } dim { size: -30 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -12 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: -32 } dim { size: -33 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 ? /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:539: RuntimeWarning: Mean of empty slice "dist.avg": np.nanmean(dists), /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:572: RuntimeWarning: Mean of empty slice. mPCK = mPCK_parts.mean() /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:666: RuntimeWarning: Mean of empty slice. pair_pck = metrics["pck.pcks"].mean(axis=-1).mean(axis=-1) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:181: RuntimeWarning: invalid value encountered in true_divide ret = um.true_divide( /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:668: RuntimeWarning: Mean of empty slice. metrics["oks.mOKS"] = pair_oks.mean() WARNING:sleap.nn.evals:Failed to compute metrics. INFO:sleap.nn.evals:Saved predictions: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/labels_pr.val.slp INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1 Finished training single_instance. Command line call: sleap-track /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp --only-suggested-frames -m /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1 -o /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/predictions/labels.v001.slp.240621_144456.predictions.slp --verbosity json --no-empty-frames Started inference at: 2024-06-21 14:44:59.315294 Args: { │ 'data_path': '/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp', │ 'models': ['/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1'], │ 'frames': '', │ 'only_labeled_frames': False, │ 'only_suggested_frames': True, │ 'output': '/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/predictions/labels.v001.slp.240621_144456.predictions.slp', │ 'no_empty_frames': True, │ 'verbosity': 'json', 2024-06-21 14:44:59.882200: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-06-21 14:44:59.882365: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) │ 'video.dataset': None, │ 'video.input_format': 'channels_last', │ 'video.index': '', │ 'cpu': False, │ 'first_gpu': False, │ 'last_gpu': False, │ 'gpu': 'auto', │ 'max_edge_length_ratio': 0.25, │ 'dist_penalty_weight': 1.0, │ 'batch_size': 4, │ 'open_in_gui': False, 2024-06-21 14:45:00.425286: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz │ 'peak_threshold': 0.2, │ 'max_instances': None, │ 'tracking.tracker': None, │ 'tracking.max_tracking': None, │ 'tracking.max_tracks': None, │ 'tracking.target_instance_count': None, │ 'tracking.pre_cull_to_target': None, │ 'tracking.pre_cull_iou_threshold': None, │ 'tracking.post_connect_single_breaks': None, │ 'tracking.clean_instance_count': None, │ 'tracking.clean_iou_threshold': None, │ 'tracking.similarity': None, │ 'tracking.match': None, │ 'tracking.robust': None, │ 'tracking.track_window': None, │ 'tracking.min_new_track_points': None, │ 'tracking.min_match_points': None, │ 'tracking.img_scale': None, │ 'tracking.of_window_size': None, │ 'tracking.of_max_levels': None, 2024-06-21 14:45:01.454999: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:45:01.470924: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -18 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: -38 } dim { size: -39 } dim { size: 1 } } } │ 'tracking.save_shifted_instances': None, │ 'tracking.kf_node_indices': None, │ 'tracking.kf_init_frame_count': None } INFO:sleap.nn.inference:Failed to query GPU memory from nvidia-smi. Defaulting to first GPU. Metal device set to: Apple M3 Pro Versions: SLEAP: 1.3.3 TensorFlow: 2.9.2 2024-06-21 14:45:02.016209: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. Numpy: 1.22.4 2024-06-21 14:45:02.031698: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -43 } dim { size: -44 } dim { size: -45 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -18 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: -47 } dim { size: -48 } dim { size: 1 } } } Python: 3.9.15 OS: macOS-14.5-arm64-arm-64bit System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True Process return code: 0 System: Process return code: 0 --> ``` # paste relevant logs here, if any ```
## Screenshots ![Screenshot 2024-06-21 at 14 18 17](https://github.com/talmolab/sleap/assets/45966708/be72fe01-cc29-444a-b15c-e04f254df876) ![Screenshot 2024-06-21 at 14 17 30](https://github.com/talmolab/sleap/assets/45966708/08502f97-8142-4153-888e-76def1e50412)
agosztolai commented 3 months ago

Screenshot 2024-06-21 at 14 17 30

Screenshot 2024-06-21 at 14 18 17

agosztolai commented 3 months ago

Ok, I think I have figured this out. The issue seems to be that I have defined a skeleton with two connected components. I noticed this because I tried to use the same pipeline as in the tutorial. The tutorial uses a multi-animal model in which the "Run" button is disabled when there are multiple connected components. However, the single-animal model trains well with skeletons having multiple connected components. Except that at inference time, no predicted instances are returned.

Could this be the issue that the model only accepts a single connected component? If yes, it would be nice to include a checkpoint for this in the code.

Lateef-Saheed commented 3 months ago

Hello! I am having a similar issue as you. Just curious if in your training settings, you have the "rotate" setting in the Augmentation section of the Single Instance Model configuration tab checked?

agosztolai commented 3 months ago

Yes, for me it works with ‘rotate’ on. Perhaps try rotate = off. If that works, maybe report the bug?

Lateef-Saheed commented 3 months ago

Sounds good, I tried turning off rotation and that got me inferences, albeit ones that weren't accurate. Thanks for the confirmation!

talmo commented 2 months ago

Hi folks,

Apologies for the delay! We're a bit behind on support responses at the moment.

Neither rotation nor the skeleton edge configuration should make any difference to the single animal model.

If you're not getting predictions, the most likely cause is that the model is underperforming. The easiest thing to try is to just add more labels, but it might be that further model parameter tuning would help.

Turning off rotation will cause the model to seriously overfit, which means it'll work on images very similar to those in your training data, but fail to generalize to new ones.

It sounds like @agosztolai has a working solution, but @Lateef-Saheed if you don't mind creating a new Discussion with some information about your project, we'd be happy to help!

Lateef-Saheed commented 2 months ago

Noted! We ended up labeling more frames and using the "Latest" model instead of "Best" and it ended up giving us predictions. I am curious about your comment about rotation. What exactly does the rotation augmentation feature mean and how would not selecting it result in failing to generalize to new videos? Thanks for the help!

