talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
428 stars 97 forks source link

No predictions #1821

Open agosztolai opened 3 months ago

agosztolai commented 3 months ago

Bug description

After training, the inference step outputs no labels.

Expected behaviour

I expect labels (yellow markers) to emerge in the GUI after inference.

Actual behaviour

Nothing happens.

Your personal set up

Environment packages ``` # paste output of `pip freeze` or `conda list` here ``` # Name Version Build Channel abseil-cpp 20211102.0 he4e09e4_3 conda-forge absl-py 1.4.0 pypi_0 pypi aiohttp 3.9.5 py39h17cfd9d_0 conda-forge aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge aom 3.5.0 h7ea286d_0 conda-forge astunparse 1.6.3 pyhd8ed1ab_0 conda-forge async-timeout 4.0.3 pyhd8ed1ab_0 conda-forge attrs 23.2.0 pyh71513ae_0 conda-forge blinker 1.8.2 pyhd8ed1ab_0 conda-forge blosc 1.21.5 hc338f07_0 conda-forge brotli 1.0.9 h1a8c8d9_9 conda-forge brotli-bin 1.0.9 h1a8c8d9_9 conda-forge brotli-python 1.0.9 py39h23fbdae_9 conda-forge brunsli 0.1 h9f76cd9_0 conda-forge bzip2 1.0.8 h93a5062_5 conda-forge c-ares 1.28.1 h93a5062_0 conda-forge c-blosc2 2.12.0 ha57e6be_0 conda-forge ca-certificates 2024.6.2 hf0a4a13_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.3.1 pypi_0 pypi cairo 1.16.0 had492bb_1012 conda-forge cattrs 1.1.1 pyhd8ed1ab_0 conda-forge certifi 2024.6.2 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py39he153c15_0 conda-forge cfitsio 4.2.0 h2f961c4_0 conda-forge charls 2.3.4 hbdafb3b_0 conda-forge charset-normalizer 3.2.0 pypi_0 pypi click 8.1.7 unix_pyh707e725_0 conda-forge contourpy 1.2.1 py39h48c5dd5_0 conda-forge cryptography 39.0.0 py39haa0b8cc_0 conda-forge cycler 0.12.1 pyhd8ed1ab_0 conda-forge dav1d 1.2.1 hb547adb_0 conda-forge efficientnet 1.0.0 pypi_0 pypi expat 2.6.2 hebf3989_0 conda-forge ffmpeg 4.4.2 gpl_hf318d42_112 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_2 conda-forge fontconfig 2.14.2 h82840c6_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.53.0 py39hfea33bf_0 conda-forge freetype 2.12.1 hadb7bae_2 conda-forge frozenlist 1.4.1 py39h17cfd9d_0 conda-forge gast 0.4.0 pyh9f0ad1d_0 conda-forge geos 3.12.1 h965bd2d_0 conda-forge gettext 0.22.5 h8fbad5d_2 conda-forge gettext-tools 0.22.5 h8fbad5d_2 conda-forge giflib 5.2.2 h93a5062_0 conda-forge glib 2.80.2 h59d46d9_1 conda-forge glib-tools 2.80.2 h8ba3eef_1 conda-forge gmp 6.3.0 hebf3989_1 conda-forge gnutls 3.7.9 hd26332c_0 conda-forge google-auth 2.23.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pyhd8ed1ab_0 conda-forge google-pasta 0.2.0 pyh8c360ce_0 conda-forge graphite2 1.3.13 hebf3989_1003 conda-forge grpc-cpp 1.46.3 hacd037c_3 conda-forge grpcio 1.58.0 pypi_0 pypi gst-plugins-base 1.22.9 h09b4b5e_1 conda-forge gstreamer 1.22.9 h551c6ff_1 conda-forge h5py 3.8.0 nompi_py39hc9149d8_100 conda-forge harfbuzz 5.3.0 hddbc195_0 conda-forge hdf5 1.12.2 nompi_h55deafc_101 conda-forge hdmf 3.9.0 pypi_0 pypi icu 70.1 h6b3803e_0 conda-forge idna 3.4 pypi_0 pypi image-classifiers 1.0.0 pypi_0 pypi imagecodecs 2022.9.26 py39hd7f743f_4 conda-forge imageio 2.34.1 pyh4b66e23_0 conda-forge imgaug 0.4.0 pyhd8ed1ab_1 conda-forge imgstore 0.2.9 pypi_0 pypi importlib-metadata 7.1.0 pyha770c72_0 conda-forge importlib-resources 6.4.0 pyhd8ed1ab_0 conda-forge importlib_resources 6.4.0 pyhd8ed1ab_0 conda-forge jasper 2.0.33 hc3cd1e9_1 conda-forge joblib 1.4.2 pyhd8ed1ab_0 conda-forge jpeg 9e h1a8c8d9_3 conda-forge jsmin 3.0.1 pyhd8ed1ab_0 conda-forge jsonpickle 1.2 py_0 conda-forge jsonschema 4.19.0 pypi_0 pypi jsonschema-specifications 2023.7.1 pypi_0 pypi jxrlib 1.1 h93a5062_3 conda-forge keras 2.9.0 pyhd8ed1ab_0 conda-forge keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pyhd8ed1ab_0 conda-forge kiwisolver 1.4.5 py39hbd775c9_1 conda-forge krb5 1.20.1 h127bd45_0 conda-forge lame 3.100 h1a8c8d9_1003 conda-forge lazy_loader 0.4 pyhd8ed1ab_0 conda-forge lcms2 2.14 h8193b64_0 conda-forge lerc 4.0.0 h9a09cb3_0 conda-forge libabseil 20211102.0 cxx17_h28b99d4_3 conda-forge libaec 1.1.3 hebf3989_0 conda-forge libasprintf 0.22.5 h8fbad5d_2 conda-forge libasprintf-devel 0.22.5 h8fbad5d_2 conda-forge libavif 0.11.1 h9f83d30_2 conda-forge libblas 3.9.0 20_osxarm64_openblas conda-forge libbrotlicommon 1.0.9 h1a8c8d9_9 conda-forge libbrotlidec 1.0.9 h1a8c8d9_9 conda-forge libbrotlienc 1.0.9 h1a8c8d9_9 conda-forge libcblas 3.9.0 20_osxarm64_openblas conda-forge libclang 16.0.6 pypi_0 pypi libclang13 14.0.6 default_hc7183e1_1 conda-forge libcurl 7.87.0 hbe9bab4_0 conda-forge libcxx 17.0.6 h5f092b4_0 conda-forge libdeflate 1.14 h1a8c8d9_0 conda-forge libedit 3.1.20191231 hc8eb9b7_2 conda-forge libev 4.33 h93a5062_2 conda-forge libexpat 2.6.2 hebf3989_0 conda-forge libffi 3.4.2 h3422bc3_5 conda-forge libgettextpo 0.22.5 h8fbad5d_2 conda-forge libgettextpo-devel 0.22.5 h8fbad5d_2 conda-forge libgfortran 5.0.0 13_2_0_hd922786_3 conda-forge libgfortran5 13.2.0 hf226fd6_3 conda-forge libglib 2.80.2 h59d46d9_1 conda-forge libiconv 1.17 h0d3ecfb_2 conda-forge libidn2 2.3.7 h93a5062_0 conda-forge libintl 0.22.5 h8fbad5d_2 conda-forge libintl-devel 0.22.5 h8fbad5d_2 conda-forge liblapack 3.9.0 20_osxarm64_openblas conda-forge liblapacke 3.9.0 20_osxarm64_openblas conda-forge libllvm14 14.0.6 hd1a9a77_4 conda-forge libnghttp2 1.51.0 hd184df1_0 conda-forge libogg 1.3.4 h27ca646_1 conda-forge libopenblas 0.3.25 openmp_h6c19121_0 conda-forge libopencv 4.6.0 py39he1c1adf_3 conda-forge libopus 1.3.1 h27ca646_1 conda-forge libpng 1.6.43 h091b4b1_0 conda-forge libpq 15.1 hbce9e56_3 conda-forge libprotobuf 3.20.3 hb5ab8b9_0 conda-forge libsodium 1.0.18 h27ca646_1 conda-forge libsqlite 3.46.0 hfb93653_0 conda-forge libssh2 1.10.0 hb80f160_3 conda-forge libtasn1 4.19.0 h1a8c8d9_0 conda-forge libtiff 4.4.0 heb92581_5 conda-forge libunistring 0.9.10 h3422bc3_0 conda-forge libvorbis 1.3.7 h9f76cd9_0 conda-forge libvpx 1.11.0 hc470f4d_3 conda-forge libwebp-base 1.4.0 h93a5062_0 conda-forge libxcb 1.13 h9b22ae9_1004 conda-forge libxml2 2.10.3 h67585b2_4 conda-forge libxslt 1.1.37 h1bd8bc4_0 conda-forge libzlib 1.3.1 hfb2fe0b_1 conda-forge libzopfli 1.0.3 h9f76cd9_0 conda-forge llvm-openmp 18.1.7 hde57baf_0 conda-forge lz4-c 1.9.4 hb7217d7_0 conda-forge markdown 3.4.4 pypi_0 pypi markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.3 pypi_0 pypi matplotlib-base 3.8.4 py39h15359f4_2 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge multidict 6.0.5 py39h02fc5c5_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mysql-common 8.0.32 hab468bb_0 conda-forge mysql-libs 8.0.32 hea58576_0 conda-forge ncurses 6.5 hb89a1cb_0 conda-forge ndx-pose 0.1.1 pypi_0 pypi nettle 3.9.1 h40ed0f5_0 conda-forge networkx 3.2.1 pyhd8ed1ab_0 conda-forge nixio 1.5.3 pypi_0 pypi nspr 4.35 hb7217d7_0 conda-forge nss 3.101 hc42bcbf_0 conda-forge numpy 1.22.4 py39h7df2422_0 conda-forge oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge opencv 4.6.0 py39hdf13c20_3 conda-forge openh264 2.3.1 hb7217d7_2 conda-forge openjpeg 2.5.0 h5d4e404_1 conda-forge openssl 1.1.1w h53f4e23_0 conda-forge opt_einsum 3.3.0 pyhc1e730c_2 conda-forge p11-kit 0.24.1 h29577a5_0 conda-forge packaging 24.1 pyhd8ed1ab_0 conda-forge pandas 2.2.2 py39h998126f_1 conda-forge patsy 0.5.6 pyhd8ed1ab_0 conda-forge pcre2 10.44 h297a79d_0 conda-forge pillow 9.2.0 py39h139752e_3 conda-forge pip 24.0 pyhd8ed1ab_0 conda-forge pixman 0.43.4 hebf3989_0 conda-forge protobuf 3.19.6 pypi_0 pypi psutil 5.9.8 py39h17cfd9d_0 conda-forge pthread-stubs 0.4 h27ca646_1001 conda-forge py-opencv 4.6.0 py39hfa6204d_3 conda-forge pyasn1 0.5.0 pypi_0 pypi pyasn1-modules 0.3.0 pypi_0 pypi pycparser 2.22 pyhd8ed1ab_0 conda-forge pygments 2.18.0 pyhd8ed1ab_0 conda-forge pyjwt 2.8.0 pyhd8ed1ab_1 conda-forge pykalman 0.9.7 pyhd8ed1ab_0 conda-forge pynwb 2.5.0 pypi_0 pypi pyopenssl 23.2.0 pyhd8ed1ab_1 conda-forge pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge pyside2 5.15.8 py39h0adaba8_2 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.9.15 h2d96c93_0_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-flatbuffers 1.12 pyhd8ed1ab_1 conda-forge python-rapidjson 1.17 py39hbf7db11_0 conda-forge python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge python_abi 3.9 4_cp39 conda-forge pytz 2024.1 pyhd8ed1ab_0 conda-forge pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge pywavelets 1.4.1 py39hf4a74a7_1 conda-forge pyyaml 6.0.1 py39h0f82c59_1 conda-forge pyzmq 26.0.3 py39he7f0319_0 conda-forge qimage2ndarray 1.10.0 pypi_0 pypi qt-main 5.15.8 hfe8d25c_6 conda-forge qtpy 2.4.1 pyhd8ed1ab_0 conda-forge re2 2022.06.01 h9a09cb3_1 conda-forge readline 8.2 h92ec313_1 conda-forge referencing 0.30.2 pypi_0 pypi requests 2.31.0 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rich 13.7.1 pyhd8ed1ab_0 conda-forge rpds-py 0.10.3 pypi_0 pypi rsa 4.9 pyhd8ed1ab_0 conda-forge ruamel-yaml 0.17.32 pypi_0 pypi ruamel-yaml-clib 0.2.7 pypi_0 pypi scikit-image 0.22.0 py39hf8cecc8_2 conda-forge scikit-learn 1.0 py39h12ba089_1 conda-forge scikit-video 1.1.11 pyh24bf2e0_0 conda-forge scipy 1.9.0 py39h14896cb_0 conda-forge seaborn 0.13.2 hd8ed1ab_2 conda-forge seaborn-base 0.13.2 pyhd8ed1ab_2 conda-forge segmentation-models 1.0.1 pypi_0 pypi setuptools 70.0.0 pyhd8ed1ab_0 conda-forge shapely 2.0.4 py39h8b557c8_1 conda-forge six 1.15.0 pypi_0 pypi sleap 1.3.3 pypi_0 pypi snappy 1.1.10 hd04f947_1 conda-forge sqlite 3.46.0 h5838104_0 conda-forge statsmodels 0.14.2 py39h161d348_0 conda-forge svt-av1 1.4.1 h7ea286d_0 conda-forge tensorboard 2.9.1 pypi_0 pypi tensorboard-data-server 0.6.1 py39haa0b8cc_4 conda-forge tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge tensorflow 2.9.1 cpu_py39h2839aeb_0 conda-forge tensorflow-base 2.9.1 cpu_py39ha1ad4ae_0 conda-forge tensorflow-estimator 2.9.1 cpu_py39h7b621ec_0 conda-forge tensorflow-hub 0.12.0 pyhca92ed8_0 conda-forge tensorflow-macos 2.9.2 pypi_0 pypi tensorflow-metal 0.5.0 pypi_0 pypi termcolor 2.3.0 pypi_0 pypi threadpoolctl 3.5.0 pyhc1e730c_0 conda-forge tifffile 2022.10.10 pyhd8ed1ab_0 conda-forge tk 8.6.13 h5083fa2_1 conda-forge typing-extensions 4.12.2 hd8ed1ab_0 conda-forge typing_extensions 4.12.2 pyha770c72_0 conda-forge tzdata 2024a h0c530f3_0 conda-forge tzlocal 5.0.1 pypi_0 pypi unicodedata2 15.1.0 py39h0f82c59_0 conda-forge urllib3 1.26.16 pypi_0 pypi werkzeug 2.3.7 pypi_0 pypi wheel 0.43.0 pyhd8ed1ab_1 conda-forge wrapt 1.15.0 pypi_0 pypi x264 1!164.3095 h57fd34a_2 conda-forge x265 3.5 hbc6ce65_3 conda-forge xorg-libxau 1.0.11 hb547adb_0 conda-forge xorg-libxdmcp 1.1.3 h27ca646_0 conda-forge xz 5.2.6 h57fd34a_0 conda-forge yaml 0.2.5 h3422bc3_2 conda-forge yarl 1.9.4 py39h17cfd9d_0 conda-forge zeromq 4.3.5 hebf3989_1 conda-forge zfp 1.0.1 ha8f4885_0 conda-forge zipp 3.19.2 pyhd8ed1ab_0 conda-forge zlib-ng 2.0.7 h1a8c8d9_0 conda-forge zstd 1.5.6 hb46c0d2_0 conda-forge
Logs (sleap) adamgosztolai@Adams-MacBook-Pro-2 data % sleap-label Saving config: /Users/adamgosztolai/.sleap/1.3.3/preferences.yaml Restoring GUI state... Software versions: SLEAP: 1.3.3 TensorFlow: 2.9.2 Numpy: 1.22.4 Python: 3.9.15 OS: macOS-14.5-arm64-arm-64bit Happy SLEAPing! :) qt.qpa.fonts: Populating font family aliases took 155 ms. Replace uses of missing font family ".AppleSystemUIFont" with one that exists to avoid this cost. Resetting monitor window. Polling: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz/validation.*.png Start training single_instance... ['sleap-train', '/var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json', '/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.3 TensorFlow: 2.9.2 Numpy: 1.22.4 Python: 3.9.15 OS: macOS-14.5-arm64-arm-64bit INFO:sleap.nn.training:Training labels file: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp INFO:sleap.nn.training:Training profile: /var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "/var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json", "labels_path": "/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 1.0, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": { "part_names": null, "sigma": 2.5, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "centroid": null, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 1, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 2, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "240621_144245.single_instance.n=1", "run_name_prefix": "calibration", "run_name_suffix": "", "runs_folder": "/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.3", "filename": "/var/folders/3n/71gnzd013y5f29s9t0tyyrwh0000gn/T/tmpnecxtno4/240621_144245_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Failed to query GPU memory from nvidia-smi. Defaulting to first GPU. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 1 / Validation = 1. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... Metal device set to: Apple M3 Pro systemMemory: 36.00 GB maxCacheSize: 13.50 GB 2024-06-21 14:42:49.053848: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-06-21 14:42:49.053983: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2024-06-21 14:42:49.277816: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz INFO:sleap.nn.training:Loaded test example. [0.581s] INFO:sleap.nn.training: Input shape: (800, 1936, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,303 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = SingleInstanceConfmapsHead(part_names=['ruler_1', 'ruler_2', 'ruler_3', 'ruler_4', 'wand_1', 'wand_2', 'wand_3'], sigma=2.5, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 400, 968, 7), dtype=tf.float32, name=None), name='SingleInstanceConfmapsHead/BiasAdd:0', description="created by layer 'SingleInstanceConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 1 INFO:sleap.nn.training:Validation set: n = 1 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [1.0s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [0.8s] INFO:sleap.nn.training:Starting training loop... Epoch 1/2 2024-06-21 14:42:51.186166: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:51.529986: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:52.342114: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:52.753515: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:43:52.759774: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 7 } dim { size: 400 } dim { size: 968 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } } 200/200 - 63s - loss: 5.0801e-05 - val_loss: 5.0457e-05 - lr: 1.0000e-04 - 63s/epoch - 313ms/step Epoch 2/2 Polling: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz/validation.*.png 200/200 - 61s - loss: 5.0256e-05 - val_loss: 4.9935e-05 - lr: 1.0000e-04 - 61s/epoch - 305ms/step INFO:sleap.nn.training:Finished training loop. [2.1 min] INFO:sleap.nn.training:Deleting visualization directory: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz INFO:sleap.nn.training:Saving evaluation metrics to model folder... Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?Polling: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/viz/validation.*.png 2024-06-21 14:44:55.050630: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:44:55.065402: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -28 } dim { size: -29 } dim { size: -30 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -12 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: -32 } dim { size: -33 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 ? /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:539: RuntimeWarning: Mean of empty slice "dist.avg": np.nanmean(dists), /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:572: RuntimeWarning: Mean of empty slice. mPCK = mPCK_parts.mean() /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:666: RuntimeWarning: Mean of empty slice. pair_pck = metrics["pck.pcks"].mean(axis=-1).mean(axis=-1) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:181: RuntimeWarning: invalid value encountered in true_divide ret = um.true_divide( /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:668: RuntimeWarning: Mean of empty slice. metrics["oks.mOKS"] = pair_oks.mean() WARNING:sleap.nn.evals:Failed to compute metrics. INFO:sleap.nn.evals:Saved predictions: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/labels_pr.train.slp Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2024-06-21 14:44:55.461039: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:44:55.476306: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -28 } dim { size: -29 } dim { size: -30 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -12 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -12 } dim { size: -32 } dim { size: -33 } dim { size: 1 } } } Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 ? /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:539: RuntimeWarning: Mean of empty slice "dist.avg": np.nanmean(dists), /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:572: RuntimeWarning: Mean of empty slice. mPCK = mPCK_parts.mean() /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:666: RuntimeWarning: Mean of empty slice. pair_pck = metrics["pck.pcks"].mean(axis=-1).mean(axis=-1) /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/numpy/core/_methods.py:181: RuntimeWarning: invalid value encountered in true_divide ret = um.true_divide( /opt/anaconda3/envs/sleap/lib/python3.9/site-packages/sleap/nn/evals.py:668: RuntimeWarning: Mean of empty slice. metrics["oks.mOKS"] = pair_oks.mean() WARNING:sleap.nn.evals:Failed to compute metrics. INFO:sleap.nn.evals:Saved predictions: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1/labels_pr.val.slp INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1 Finished training single_instance. Command line call: sleap-track /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp --only-suggested-frames -m /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1 -o /Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/predictions/labels.v001.slp.240621_144456.predictions.slp --verbosity json --no-empty-frames Started inference at: 2024-06-21 14:44:59.315294 Args: { │ 'data_path': '/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/labels.v001.slp', │ 'models': ['/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/models/calibration240621_144245.single_instance.n=1'], │ 'frames': '', │ 'only_labeled_frames': False, │ 'only_suggested_frames': True, │ 'output': '/Users/adamgosztolai/Documents/GitHub/large_kinematic_model/preprocessing/sleap/predictions/labels.v001.slp.240621_144456.predictions.slp', │ 'no_empty_frames': True, │ 'verbosity': 'json', 2024-06-21 14:44:59.882200: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-06-21 14:44:59.882365: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) │ 'video.dataset': None, │ 'video.input_format': 'channels_last', │ 'video.index': '', │ 'cpu': False, │ 'first_gpu': False, │ 'last_gpu': False, │ 'gpu': 'auto', │ 'max_edge_length_ratio': 0.25, │ 'dist_penalty_weight': 1.0, │ 'batch_size': 4, │ 'open_in_gui': False, 2024-06-21 14:45:00.425286: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz │ 'peak_threshold': 0.2, │ 'max_instances': None, │ 'tracking.tracker': None, │ 'tracking.max_tracking': None, │ 'tracking.max_tracks': None, │ 'tracking.target_instance_count': None, │ 'tracking.pre_cull_to_target': None, │ 'tracking.pre_cull_iou_threshold': None, │ 'tracking.post_connect_single_breaks': None, │ 'tracking.clean_instance_count': None, │ 'tracking.clean_iou_threshold': None, │ 'tracking.similarity': None, │ 'tracking.match': None, │ 'tracking.robust': None, │ 'tracking.track_window': None, │ 'tracking.min_new_track_points': None, │ 'tracking.min_match_points': None, │ 'tracking.img_scale': None, │ 'tracking.of_window_size': None, │ 'tracking.of_max_levels': None, 2024-06-21 14:45:01.454999: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2024-06-21 14:45:01.470924: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -18 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: -38 } dim { size: -39 } dim { size: 1 } } } │ 'tracking.save_shifted_instances': None, │ 'tracking.kf_node_indices': None, │ 'tracking.kf_init_frame_count': None } INFO:sleap.nn.inference:Failed to query GPU memory from nvidia-smi. Defaulting to first GPU. Metal device set to: Apple M3 Pro Versions: SLEAP: 1.3.3 TensorFlow: 2.9.2 2024-06-21 14:45:02.016209: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. Numpy: 1.22.4 2024-06-21 14:45:02.031698: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -43 } dim { size: -44 } dim { size: -45 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -18 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 12 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: -47 } dim { size: -48 } dim { size: 1 } } } Python: 3.9.15 OS: macOS-14.5-arm64-arm-64bit System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True Process return code: 0 System: Process return code: 0 --> ``` # paste relevant logs here, if any ```
## Screenshots ![Screenshot 2024-06-21 at 14 18 17](https://github.com/talmolab/sleap/assets/45966708/be72fe01-cc29-444a-b15c-e04f254df876) ![Screenshot 2024-06-21 at 14 17 30](https://github.com/talmolab/sleap/assets/45966708/08502f97-8142-4153-888e-76def1e50412)
agosztolai commented 3 months ago

Screenshot 2024-06-21 at 14 17 30

Screenshot 2024-06-21 at 14 18 17

agosztolai commented 3 months ago

Ok, I think I have figured this out. The issue seems to be that I have defined a skeleton with two connected components. I noticed this because I tried to use the same pipeline as in the tutorial. The tutorial uses a multi-animal model in which the "Run" button is disabled when there are multiple connected components. However, the single-animal model trains well with skeletons having multiple connected components. Except that at inference time, no predicted instances are returned.

Could this be the issue that the model only accepts a single connected component? If yes, it would be nice to include a checkpoint for this in the code.

Lateef-Saheed commented 3 months ago

Hello! I am having a similar issue as you. Just curious if in your training settings, you have the "rotate" setting in the Augmentation section of the Single Instance Model configuration tab checked?

agosztolai commented 3 months ago

Yes, for me it works with ‘rotate’ on. Perhaps try rotate = off. If that works, maybe report the bug?

On 25 Jun 2024, at 20:14, Lateef-Saheed @.***> wrote:



Hello! I am having a similar issue as you. Just curious if in your training settings, you have the "rotate" setting in the Augmentation section of the Single Instance Model configuration tab checked?

— Reply to this email directly, view it on GitHub https://github.com/talmolab/sleap/issues/1821#issuecomment-2189659113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK6WK5D3VZLGACSWFN2PP2LZJGXP7AVCNFSM6AAAAABJVZQP2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGY2TSMJRGM. You are receiving this because you authored the thread.

Lateef-Saheed commented 3 months ago

Sounds good, I tried turning off rotation and that got me inferences, albeit ones that weren't accurate. Thanks for the confirmation!

talmo commented 2 months ago

Hi folks,

Apologies for the delay! We're a bit behind on support responses at the moment.

Neither rotation nor the skeleton edge configuration should make any difference to the single animal model.

If you're not getting predictions, the most likely cause is that the model is underperforming. The easiest thing to try is to just add more labels, but it might be that further model parameter tuning would help.

Turning off rotation will cause the model to seriously overfit, which means it'll work on images very similar to those in your training data, but fail to generalize to new ones.

It sounds like @agosztolai has a working solution, but @Lateef-Saheed if you don't mind creating a new Discussion with some information about your project, we'd be happy to help!

Lateef-Saheed commented 2 months ago

Noted! We ended up labeling more frames and using the "Latest" model instead of "Best" and it ended up giving us predictions. I am curious about your comment about rotation. What exactly does the rotation augmentation feature mean and how would not selecting it result in failing to generalize to new videos? Thanks for the help!

On Fri, Jul 19, 2024 at 7:02 PM Talmo Pereira @.***> wrote:

Hi folks,

Apologies for the delay! We're a bit behind on support responses at the moment.

Neither rotation nor the skeleton edge configuration should make any difference to the single animal model.

If you're not getting predictions, the most likely cause is that the model is underperforming. The easiest thing to try is to just add more labels, but it might be that further model parameter tuning would help.

Turning off rotation will cause the model to seriously overfit, which means it'll work on images very similar to those in your training data, but fail to generalize to new ones.

It sounds like @agosztolai https://github.com/agosztolai has a working solution, but @Lateef-Saheed https://github.com/Lateef-Saheed if you don't mind creating a new Discussion https://github.com/talmolab/sleap/discussions with some information about your project, we'd be happy to help!

— Reply to this email directly, view it on GitHub https://github.com/talmolab/sleap/issues/1821#issuecomment-2240781386, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG7CTUA3QCZDTGFHPUO23JTZNGSKFAVCNFSM6AAAAABJVZQP2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBQG44DCMZYGY . You are receiving this because you were mentioned.Message ID: @.***>