talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
436 stars 97 forks source link

Incorrect predictions with single instance model when run on all frames in video #769

Closed sheridana closed 2 years ago

sheridana commented 2 years ago

Bug description

When running inference using a single instance model, predictions are fine when predicting on a random sample of frames but are incorrect when predicting on all frames. Suspected issue with SingleInstancePredictor

Expected behaviour

Predictions should be fine using a single instance model regardless of how many frames it is run on.

Actual behaviour

Predictions are incorrect when predicting on all frames with a single instance model.

Your personal set up

Environment packages ``` (M1 packages) # Name Version Build Channel absl-py 0.10.0 pyhd8ed1ab_1 conda-forge aiohttp 3.8.1 py38h1a28f6b_1 aiosignal 1.2.0 pyhd3eb1b0_0 alabaster 0.7.12 pypi_0 pypi anyio 3.5.0 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi appnope 0.1.2 pypi_0 pypi argon2-cffi 21.3.0 pypi_0 pypi argon2-cffi-bindings 21.2.0 pypi_0 pypi asttokens 2.0.5 pypi_0 pypi astunparse 1.6.3 py_0 async-timeout 4.0.1 pyhd3eb1b0_0 attrs 21.2.0 pypi_0 pypi babel 2.9.1 pypi_0 pypi backcall 0.2.0 pypi_0 pypi backports-zoneinfo 0.2.1 pypi_0 pypi beautifulsoup4 4.10.0 pypi_0 pypi black 21.6b0 pypi_0 pypi blas 2.113 openblas conda-forge blas-devel 3.9.0 13_osxarm64_openblas conda-forge bleach 4.1.0 pypi_0 pypi blinker 1.4 py38hca03da5_0 brotlipy 0.7.0 py38h1a28f6b_1002 c-ares 1.18.1 h1a28f6b_0 ca-certificates 2022.2.1 hca03da5_0 cached-property 1.5.2 py_0 cachetools 4.2.2 pyhd3eb1b0_0 cattrs 1.1.1 pypi_0 pypi certifi 2021.10.8 py38hca03da5_2 cffi 1.15.0 py38h22df2f2_1 cfgv 3.3.1 pypi_0 pypi charset-normalizer 2.0.4 pyhd3eb1b0_0 click 8.0.4 py38hca03da5_0 colorama 0.4.4 pypi_0 pypi commonmark 0.9.1 pypi_0 pypi coverage 6.3.2 pypi_0 pypi cryptography 3.4.7 py38h9dbe03d_0 cycler 0.11.0 pypi_0 pypi dataclasses 0.8 pyh6d0b6a4_7 debugpy 1.5.1 pypi_0 pypi decorator 5.1.1 pypi_0 pypi defusedxml 0.7.1 pypi_0 pypi deprecated 1.2.13 pypi_0 pypi distlib 0.3.4 pypi_0 pypi docutils 0.17.1 pypi_0 pypi efficientnet 1.0.0 pypi_0 pypi entrypoints 0.4 pypi_0 pypi executing 0.8.3 pypi_0 pypi filelock 3.6.0 pypi_0 pypi flake8 4.0.1 pypi_0 pypi flatbuffers 2.0 pypi_0 pypi fonttools 4.30.0 pypi_0 pypi frozenlist 1.2.0 py38h1a28f6b_0 furo 2022.3.4 pypi_0 pypi gast 0.4.0 pyhd3eb1b0_0 geos 3.9.1 hc377ac9_1 gitdb 4.0.9 pypi_0 pypi gitpython 3.1.27 pypi_0 pypi google-auth 1.35.0 pypi_0 pypi google-auth-oauthlib 0.4.1 py_2 google-pasta 0.2.0 pyhd3eb1b0_0 grpcio 1.42.0 py38h95c9599_0 h5py 3.1.0 nompi_py38h032b01a_100 conda-forge hdf5 1.10.6 nompi_h0fc092c_1114 conda-forge identify 2.4.11 pypi_0 pypi idna 3.3 pyhd3eb1b0_0 image-classifiers 1.0.0 pypi_0 pypi imageio 2.15.0 pypi_0 pypi imagesize 1.3.0 pypi_0 pypi imgaug 0.4.0 pypi_0 pypi imgstore 0.2.9 pypi_0 pypi importlib-metadata 4.8.2 py38hca03da5_0 importlib-resources 5.4.0 pypi_0 pypi iniconfig 1.1.1 pypi_0 pypi ipykernel 6.9.2 pypi_0 pypi ipython 8.1.1 pypi_0 pypi ipython-genutils 0.2.0 pypi_0 pypi ipywidgets 7.6.5 pypi_0 pypi jedi 0.17.2 pypi_0 pypi jinja2 3.0.3 pypi_0 pypi joblib 1.1.0 pypi_0 pypi jsmin 3.0.1 pypi_0 pypi json5 0.9.6 pypi_0 pypi jsonpickle 1.2 pypi_0 pypi jsonschema 4.4.0 pypi_0 pypi jupyter-cache 0.4.3 pypi_0 pypi jupyter-client 7.1.2 pypi_0 pypi jupyter-core 4.9.2 pypi_0 pypi jupyter-server 1.15.3 pypi_0 pypi jupyter-server-mathjax 0.2.5 pypi_0 pypi jupyter-sphinx 0.3.2 pypi_0 pypi jupyterlab 3.3.2 pypi_0 pypi jupyterlab-pygments 0.1.2 pypi_0 pypi jupyterlab-server 2.10.3 pypi_0 pypi jupyterlab-widgets 1.0.2 pypi_0 pypi keras 2.7.0 pyhd8ed1ab_0 conda-forge keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pyhd3eb1b0_0 keyring 23.5.0 pypi_0 pypi kiwisolver 1.3.2 pypi_0 pypi krb5 1.19.2 h3b8d789_0 libblas 3.9.0 13_osxarm64_openblas conda-forge libcblas 3.9.0 13_osxarm64_openblas conda-forge libclang 13.0.0 pypi_0 pypi libcurl 7.80.0 hc6d1d07_0 libcxx 12.0.0 hf6beb65_1 libedit 3.1.20210910 h1a28f6b_0 libev 4.33 h1a28f6b_1 libffi 3.4.2 hc377ac9_2 libgfortran 5.0.0 11_1_0_h6a59814_26 libgfortran5 11.1.0 h6a59814_26 liblapack 3.9.0 13_osxarm64_openblas conda-forge liblapacke 3.9.0 13_osxarm64_openblas conda-forge libllvm11 11.1.0 h12f7ac0_4 libnghttp2 1.46.0 h95c9599_0 libopenblas 0.3.18 openmp_h5dd58f0_0 conda-forge libprotobuf 3.19.1 h98b2900_0 libssh2 1.9.0 hf27765b_1 livereload 2.6.3 pypi_0 pypi llvm-openmp 12.0.0 haf9daa7_1 markdown 3.3.4 py38hca03da5_0 markdown-it-py 1.1.0 pypi_0 pypi markupsafe 2.1.0 pypi_0 pypi matplotlib 3.5.1 pypi_0 pypi matplotlib-inline 0.1.3 pypi_0 pypi mccabe 0.6.1 pypi_0 pypi mdit-py-plugins 0.2.8 pypi_0 pypi mistune 0.8.4 pypi_0 pypi multidict 5.2.0 py38h1a28f6b_2 mypy-extensions 0.4.3 pypi_0 pypi myst-nb 0.13.2 pypi_0 pypi myst-parser 0.15.2 pypi_0 pypi nbclassic 0.3.6 pypi_0 pypi nbclient 0.5.13 pypi_0 pypi nbconvert 6.4.4 pypi_0 pypi nbdime 3.1.1 pypi_0 pypi nbformat 5.2.0 pypi_0 pypi ncurses 6.3 h1a28f6b_2 nest-asyncio 1.5.4 pypi_0 pypi networkx 2.7.1 pypi_0 pypi nodeenv 1.6.0 pypi_0 pypi notebook 6.4.9 pypi_0 pypi notebook-shim 0.1.0 pypi_0 pypi numpy 1.21.5 pypi_0 pypi oauthlib 3.2.0 pyhd3eb1b0_0 openblas 0.3.18 openmp_h3b88efd_0 conda-forge opencv-python 4.5.5.64 pypi_0 pypi opencv-python-headless 4.5.5.62 pypi_0 pypi openssl 1.1.1m h1a28f6b_0 opt_einsum 3.3.0 pyhd3eb1b0_1 packaging 21.3 pypi_0 pypi pandas 1.4.1 pypi_0 pypi pandocfilters 1.5.0 pypi_0 pypi parso 0.7.1 pypi_0 pypi pathspec 0.9.0 pypi_0 pypi pexpect 4.8.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 8.4.0 pypi_0 pypi pip 21.2.4 py38hca03da5_0 pkginfo 1.8.2 pypi_0 pypi platformdirs 2.5.1 pypi_0 pypi pluggy 1.0.0 pypi_0 pypi pre-commit 2.17.0 pypi_0 pypi prometheus-client 0.13.1 pypi_0 pypi prompt-toolkit 3.0.28 pypi_0 pypi protobuf 3.19.1 py38hc377ac9_0 psutil 5.9.0 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi py 1.11.0 pypi_0 pypi pyasn1 0.4.8 pyhd3eb1b0_0 pyasn1-modules 0.2.8 py_0 pycodestyle 2.8.0 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 pyflakes 2.4.0 pypi_0 pypi pygithub 1.55 pypi_0 pypi pygments 2.11.2 pypi_0 pypi pyjwt 2.1.0 py38hca03da5_0 pykalman 0.9.5 pypi_0 pypi pynacl 1.5.0 pypi_0 pypi pyopenssl 21.0.0 pyhd3eb1b0_1 pyparsing 3.0.7 pypi_0 pypi pyrsistent 0.18.1 pypi_0 pypi pyside6 6.2.2.1 pypi_0 pypi pysocks 1.7.1 py38hca03da5_0 pytest 7.1.0 pypi_0 pypi pytest-cov 3.0.0 pypi_0 pypi pytest-qt 4.0.2 pypi_0 pypi pytest-xvfb 2.0.0 pypi_0 pypi python 3.8.11 hbdb9e5c_5 python-dateutil 2.8.2 pypi_0 pypi python-rapidjson 1.6 pypi_0 pypi python_abi 3.8 2_cp38 conda-forge pytz 2021.3 pypi_0 pypi pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi pyvirtualdisplay 3.0 pypi_0 pypi pywavelets 1.3.0 pypi_0 pypi pyyaml 6.0 pypi_0 pypi pyzmq 22.3.0 pypi_0 pypi qimage2ndarray 1.9.0 pypi_0 pypi readline 8.1.2 h1a28f6b_1 readme-renderer 34.0 pypi_0 pypi regex 2022.3.2 pypi_0 pypi requests 2.27.1 pyhd3eb1b0_0 requests-oauthlib 1.3.0 py_0 requests-toolbelt 0.9.1 pypi_0 pypi rfc3986 2.0.0 pypi_0 pypi rich 10.16.1 pypi_0 pypi rsa 4.7.2 pyhd3eb1b0_1 scikit-image 0.19.2 pypi_0 pypi scikit-learn 1.0.2 pypi_0 pypi scikit-video 1.1.11 pypi_0 pypi scipy 1.7.3 py38h2f0f56f_0 seaborn 0.11.2 pypi_0 pypi segmentation-models 1.0.1 pypi_0 pypi send2trash 1.8.0 pypi_0 pypi setuptools 58.0.4 py38hca03da5_1 shapely 1.7.1 py38h18ef730_5 shiboken6 6.2.2.1 pypi_0 pypi six 1.15.0 pyhd3eb1b0_0 sleap 1.2.0a6 dev_0 smmap 5.0.0 pypi_0 pypi sniffio 1.2.0 pypi_0 pypi snowballstemmer 2.2.0 pypi_0 pypi soupsieve 2.3.1 pypi_0 pypi sphinx 4.4.0 pypi_0 pypi sphinx-autobuild 2021.3.14 pypi_0 pypi sphinx-copybutton 0.5.0 pypi_0 pypi sphinx-togglebutton 0.3.0 pypi_0 pypi sphinxcontrib-applehelp 1.0.2 pypi_0 pypi sphinxcontrib-devhelp 1.0.2 pypi_0 pypi sphinxcontrib-htmlhelp 2.0.0 pypi_0 pypi sphinxcontrib-jsmath 1.0.1 pypi_0 pypi sphinxcontrib-qthelp 1.0.3 pypi_0 pypi sphinxcontrib-serializinghtml 1.1.5 pypi_0 pypi sqlalchemy 1.4.32 pypi_0 pypi sqlite 3.38.0 h1058600_0 stack-data 0.2.0 pypi_0 pypi tensorboard 2.6.0 py_0 tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.6.0 py_0 tensorflow-deps 2.7.0 0 apple tensorflow-estimator 2.7.0 pypi_0 pypi tensorflow-macos 2.7.0 pypi_0 pypi tensorflow-metal 0.3.0 pypi_0 pypi termcolor 1.1.0 py38hca03da5_1 terminado 0.13.3 pypi_0 pypi testpath 0.6.0 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi tifffile 2022.2.9 pypi_0 pypi tk 8.6.11 hb8d0fd4_0 toml 0.10.2 pypi_0 pypi tomli 2.0.1 pypi_0 pypi tornado 6.1 pypi_0 pypi tqdm 4.63.0 pypi_0 pypi traitlets 5.1.1 pypi_0 pypi twine 3.3.0 pypi_0 pypi typing-extensions 3.7.4.3 hd3eb1b0_0 typing_extensions 3.7.4.3 pyh06a4308_0 tzdata 2021.5 pypi_0 pypi tzlocal 4.1 pypi_0 pypi urllib3 1.26.8 pyhd3eb1b0_0 virtualenv 20.13.3 pypi_0 pypi wcwidth 0.2.5 pypi_0 pypi webencodings 0.5.1 pypi_0 pypi websocket-client 1.3.1 pypi_0 pypi werkzeug 2.0.3 pyhd3eb1b0_0 wheel 0.35.1 pyhd3eb1b0_0 widgetsnbextension 3.5.2 pypi_0 pypi wrapt 1.12.1 py38h1a28f6b_1 xz 5.2.5 h1a28f6b_0 yarl 1.6.3 py38h1a28f6b_1 zipp 3.7.0 pyhd3eb1b0_0 zlib 1.2.11 h5a0b063_4 ```
Logs ``` (sleap) C:\Users\RFK>sleap-label Saving config: C:\Users\RFK/.sleap/1.2.3/preferences.yaml Restoring GUI state... Software versions: SLEAP: 1.2.3 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 Happy SLEAPing! :) Using already trained model for single_instance: C:/Users/RFK/sleap_2\models\220523_162909.single_instance.n=850\training_config.json Command line call: sleap-track --labels C:/Users/RFK/sleap_2/labels.v003c_simba.slp --only-suggested-frames -m C:/Users/RFK/sleap_2\models\220523_162909.single_instance.n=850\training_config.json --tracking.tracker none -o C:/Users/RFK/sleap_2\predictions\labels.v003c_simba.slp.220523_183021.predictions.slp --verbosity json --no-empty-frames Started inference at: 2022-05-23 18:30:26.547174 Args: { 'data_path': '', 'models': [ 2022-05-23 18:30:26.852831: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 'C:/Users/RFK/sleap_2\\models\\220523_162909.single_instance.n=850\\training_config.json' ], 'frames': '', 'only_labeled_frames': False, 'only_suggested_frames': True, 'output': 'C:/Users/RFK/sleap_2\\predictions\\labels.v003c_simba.slp.220523_183021.predictions.slp', 'no_empty_frames': True, 'verbosity': 'json', 'video.dataset': None, 'video.input_format': 'channels_last', 'cpu': False, 2022-05-23 18:30:27.422854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6007 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5 'first_gpu': False, 'last_gpu': False, 'gpu': 0, 'max_edge_length_ratio': 0.25, 'dist_penalty_weight': 1.0, 'batch_size': 4, 'open_in_gui': False, 'peak_threshold': 0.2, 'tracking.tracker': 'none', 'tracking.target_instance_count': None, 'tracking.pre_cull_to_target': None, 'tracking.pre_cull_iou_threshold': None, 'tracking.post_connect_single_breaks': None, 'tracking.clean_instance_count': None, 'tracking.clean_iou_threshold': None, 'tracking.similarity': None, 'tracking.match': None, 2022-05-23 18:30:28.263215: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 'tracking.track_window': None, 'tracking.min_new_track_points': None, 'tracking.min_match_points': None, 'tracking.img_scale': None, 'tracking.of_window_size': None, 'tracking.of_max_levels': None, 'tracking.kf_node_indices': None, 'tracking.kf_init_frame_count': None, 'labels': 'C:/Users/RFK/sleap_2/labels.v003c_simba.slp' } 2022-05-23 18:30:30.621169: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -62 } dim { size: -63 } dim { size: -64 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -17 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -17 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2070" frequency: 1440 num_cores: 36 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 6298796032 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { dim { size: -17 } dim { size: -66 } dim { size: -67 } dim { size: 1 } } } 2022-05-23 18:30:31.178646: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 2022-05-23 18:30:32.974087: W tensorflow/core/common_runtime/bfc_allocator.cc:338] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. Versions: SLEAP: 1.2.3 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True Finished inference at: 2022-05-23 18:30:37.088701 Total runtime: 10.541526794433594 secs Predicted frames: 200/200 Provenance: { 'sleap_version': '1.2.3', 'platform': 'Windows-10-10.0.19041-SP0', 'command': 'C:\\Users\\RFK\\anaconda3\\envs\\sleap\\Scripts\\sleap-track --labels C:/Users/RFK/sleap_2/labels.v003c_simba.slp --only-suggested-frames -m C:/Users/RFK/sleap_2\\models\\220523_162909.single_instance.n=850\\training_config.json --tracking.tracker none -o C:/Users/RFK/sleap_2\\predictions\\labels.v003c_simba.slp.220523_183021.predictions.slp --verbosity json --no-empty-frames', 'data_path': 'C:/Users/RFK/sleap_2/labels.v003c_simba.slp', 'model_paths': [ 'C:/Users/RFK/sleap_2\\models\\220523_162909.single_instance.n=850\\training_config.json' ], 'output_path': 'C:/Users/RFK/sleap_2\\predictions\\labels.v003c_simba.slp.220523_183021.predictions.slp', 'predictor': 'SingleInstancePredictor', 'total_elapsed': 10.541526794433594, 'start_timestamp': '2022-05-23 18:30:26.547174', 'finish_timestamp': '2022-05-23 18:30:37.088701' } Saved output: C:/Users/RFK/sleap_2\predictions\labels.v003c_simba.slp.220523_183021.predictions.slp Process return code: 0 Using already trained model for single_instance: C:/Users/RFK/sleap_2\models\220523_162909.single_instance.n=850\training_config.json Command line call: sleap-track C:/Users/RFK/Desktop/sleap test vids/LBNJa_3_2021-12-04_19-46-14c.mp4 --frames 0,-107913 -m C:/Users/RFK/sleap_2\models\220523_162909.single_instance.n=850\training_config.json --tracking.tracker none -o C:/Users/RFK/sleap_2\predictions\LBNJa_3_2021-12-04_19-46-14c.mp4.220523_183443.predictions.slp --verbosity json --no-empty-frames Started inference at: 2022-05-23 18:34:49.023181 Args: { 'data_path': 'C:/Users/RFK/Desktop/sleap test vids/LBNJa_3_2021-12-04_19-46-14c.mp4', 2022-05-23 18:34:49.243888: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 'models': [ 'C:/Users/RFK/sleap_2\\models\\220523_162909.single_instance.n=850\\training_config.json' ], 'frames': '0,-107913', 'only_labeled_frames': False, 'only_suggested_frames': False, 'output': 'C:/Users/RFK/sleap_2\\predictions\\LBNJa_3_2021-12-04_19-46-14c.mp4.220523_183443.predictions.slp', 'no_empty_frames': True, 'verbosity': 'json', 'video.dataset': None, 2022-05-23 18:34:49.811295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6007 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5 'video.input_format': 'channels_last', 'cpu': False, 'first_gpu': False, 'last_gpu': False, 'gpu': 0, 'max_edge_length_ratio': 0.25, 'dist_penalty_weight': 1.0, 'batch_size': 4, 'open_in_gui': False, 'peak_threshold': 0.2, 'tracking.tracker': 'none', 'tracking.target_instance_count': None, 'tracking.pre_cull_to_target': None, 'tracking.pre_cull_iou_threshold': None, 'tracking.post_connect_single_breaks': None, 'tracking.clean_instance_count': None, 'tracking.clean_iou_threshold': None, 'tracking.similarity': None, 'tracking.match': None, 2022-05-23 18:34:50.798864: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 'tracking.track_window': None, 'tracking.min_new_track_points': None, 'tracking.min_match_points': None, 'tracking.img_scale': None, 'tracking.of_window_size': None, 'tracking.of_max_levels': None, 'tracking.kf_node_indices': None, 'tracking.kf_init_frame_count': None } 2022-05-23 18:34:52.854479: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -62 } dim { size: -63 } dim { size: -64 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -17 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -17 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2070" frequency: 1440 num_cores: 36 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 6298796032 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { dim { size: -17 } dim { size: -66 } dim { size: -67 } dim { size: 1 } } } 2022-05-23 18:34:53.425053: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201 2022-05-23 18:34:55.197601: W tensorflow/core/common_runtime/bfc_allocator.cc:338] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. Versions: SLEAP: 1.2.3 TensorFlow: 2.6.3 Numpy: 1.19.5 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True Video: C:/Users/RFK/Desktop/sleap test vids/LBNJa_3_2021-12-04_19-46-14c.mp4 2022-05-23 18:50:12.857220: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -62 } dim { size: -63 } dim { size: -64 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -17 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -17 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2070" frequency: 1440 num_cores: 36 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 6298796032 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { dim { size: -17 } dim { size: -66 } dim { size: -67 } dim { size: 1 } } } Finished inference at: 2022-05-23 18:50:14.940412 Total runtime: 925.9172308444977 secs Predicted frames: 107914/107914 Provenance: { 'sleap_version': '1.2.3', 'platform': 'Windows-10-10.0.19041-SP0', 'command': 'C:\\Users\\RFK\\anaconda3\\envs\\sleap\\Scripts\\sleap-track C:/Users/RFK/Desktop/sleap test vids/LBNJa_3_2021-12-04_19-46-14c.mp4 --frames 0,-107913 -m C:/Users/RFK/sleap_2\\models\\220523_162909.single_instance.n=850\\training_config.json --tracking.tracker none -o C:/Users/RFK/sleap_2\\predictions\\LBNJa_3_2021-12-04_19-46-14c.mp4.220523_183443.predictions.slp --verbosity json --no-empty-frames', 'data_path': 'C:/Users/RFK/Desktop/sleap test vids/LBNJa_3_2021-12-04_19-46-14c.mp4', 'model_paths': [ 'C:/Users/RFK/sleap_2\\models\\220523_162909.single_instance.n=850\\training_config.json' ], 'output_path': 'C:/Users/RFK/sleap_2\\predictions\\LBNJa_3_2021-12-04_19-46-14c.mp4.220523_183443.predictions.slp', 'predictor': 'SingleInstancePredictor', 'total_elapsed': 925.9172308444977, 'start_timestamp': '2022-05-23 18:34:49.023181', 'finish_timestamp': '2022-05-23 18:50:14.940412' } Saved output: C:/Users/RFK/sleap_2\predictions\LBNJa_3_2021-12-04_19-46-14c.mp4.220523_183443.predictions.slp Process return code: 0 ```

Screenshots

inference_example

How to reproduce

  1. run sleap-label
  2. open video
  3. select predict -> run inference
  4. load single instance model
  5. run inference
  6. predictions are off when predicting on full video (not random frames)
roomrys commented 2 years ago

Update 1: I was unable to reproduce the error using the user's data. It could be a problem with the user's set-up. We are still confirming the root cause.

Update 2: After a day of zoom meetings, I have witnessed the error first hand on the OG user's machine using both a conda from source installation and a conda from package installation. Next step is to crop the video (from 100k frames to 1k frames) to test whether the problem lies in the predict on all videos implementation or if it might have something to do with the video size...

roomrys commented 2 years ago

The problem was that the model was trained on grayscale videos, but the user was attempting to predict on color videos (or vice versa - we did not verify which).