talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
405 stars 97 forks source link

shape mismatch error using resnet #1770

Open milesOIST opened 1 month ago

milesOIST commented 1 month ago

Bug description

When running resnet with certain settings, get a mismatch of sizes error

Expected behaviour

Actual behaviour

Your personal set up

Environment packages ``` # paste output of `pip freeze` or `conda list` here ``` > # packages in environment at C:\Users\ONS\anaconda3\envs\sleap: > # > # Name Version Build Channel > absl-py 0.15.0 pypi_0 pypi > aom 3.5.0 h63175ca_0 conda-forge > astunparse 1.6.3 pypi_0 pypi > attrs 21.2.0 pypi_0 pypi > backports-zoneinfo 0.2.1 pypi_0 pypi > bzip2 1.0.8 he774522_0 > ca-certificates 2023.05.30 haa95532_0 > cached-property 1.5.2 py_0 > cachetools 4.2.4 pypi_0 pypi > cattrs 1.1.1 pypi_0 pypi > certifi 2021.10.8 pypi_0 pypi > charset-normalizer 2.0.12 pypi_0 pypi > clang 5.0 pypi_0 pypi > colorama 0.4.6 pypi_0 pypi > commonmark 0.9.1 pypi_0 pypi > cuda-nvcc 11.3.58 hb8d16a4_0 nvidia > cudatoolkit 11.3.1 h59b6b97_2 > cudnn 8.2.1 cuda11.3_0 > cycler 0.11.0 pypi_0 pypi > dav1d 1.2.1 h2bbff1b_0 > efficientnet 1.0.0 pypi_0 pypi > expat 2.5.0 h63175ca_1 conda-forge > ffmpeg 5.1.2 gpl_he426399_111 conda-forge > flatbuffers 1.12 pypi_0 pypi > font-ttf-dejavu-sans-mono 2.37 hd3eb1b0_0 > font-ttf-inconsolata 2.001 hcb22688_0 > font-ttf-source-code-pro 2.030 hd3eb1b0_0 > font-ttf-ubuntu 0.83 h8b1ccd4_0 > fontconfig 2.14.2 hbde0cde_0 conda-forge > fonts-anaconda 1 h8fa9717_0 > fonts-conda-ecosystem 1 hd3eb1b0_0 > fonttools 4.38.0 pypi_0 pypi > freetype 2.12.1 ha860e81_0 > gast 0.4.0 pypi_0 pypi > geos 3.9.1 h6c2663c_0 > google-auth 1.35.0 pypi_0 pypi > google-auth-oauthlib 0.4.6 pypi_0 pypi > google-pasta 0.2.0 pypi_0 pypi > grpcio 1.44.0 pypi_0 pypi > h5py 3.1.0 nompi_py37h19fda09_100 conda-forge > hdf5 1.10.6 h1756f20_1 > hdmf 3.5.2 pypi_0 pypi > icc_rt 2022.1.0 h6049295_2 > idna 3.3 pypi_0 pypi > image-classifiers 1.0.0 pypi_0 pypi > imageio 2.15.0 pypi_0 pypi > imgaug 0.4.0 pypi_0 pypi > imgstore 0.2.9 pypi_0 pypi > importlib-metadata 4.11.1 pypi_0 pypi > importlib-resources 5.12.0 pypi_0 pypi > intel-openmp 2023.1.0 h59b6b97_46319 > joblib 1.2.0 pypi_0 pypi > jpeg 9e h2bbff1b_1 > jsmin 3.0.1 pypi_0 pypi > jsonpickle 1.2 pypi_0 pypi > jsonschema 4.17.3 pypi_0 pypi > keras 2.6.0 pypi_0 pypi > keras-applications 1.0.8 pypi_0 pypi > keras-preprocessing 1.1.2 pypi_0 pypi > kiwisolver 1.4.4 pypi_0 pypi > lcms2 2.12 h83e58a3_0 > lerc 3.0 hd77b12b_0 > libblas 3.9.0 17_win64_mkl conda-forge > libcblas 3.9.0 17_win64_mkl conda-forge > libdeflate 1.10 h8ffe710_0 conda-forge > libexpat 2.5.0 h63175ca_1 conda-forge > libiconv 1.17 h8ffe710_0 conda-forge > liblapack 3.9.0 17_win64_mkl conda-forge > libopus 1.3.1 h8ffe710_1 conda-forge > libpng 1.6.39 h8cc25b3_0 > libtiff 4.3.0 hc4061b1_4 conda-forge > libxml2 2.11.4 hc3477c8_0 conda-forge > libzlib 1.2.13 hcfcfb64_5 conda-forge > m2w64-gcc-libgfortran 5.3.0 6 conda-forge > m2w64-gcc-libs 5.3.0 7 conda-forge > m2w64-gcc-libs-core 5.3.0 7 conda-forge > m2w64-gmp 6.1.0 2 conda-forge > m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge > markdown 3.3.6 pypi_0 pypi > matplotlib 3.5.3 pypi_0 pypi > mkl 2022.1.0 h6a75c08_874 conda-forge > msys2-conda-epoch 20160418 1 conda-forge > ndx-pose 0.1.1 pypi_0 pypi > networkx 2.6.3 pypi_0 pypi > nixio 1.5.3 pypi_0 pypi > numpy 1.19.5 py37h4c2b6ed_3 conda-forge > oauthlib 3.2.0 pypi_0 pypi > olefile 0.46 py37_0 > opencv-python 4.5.5.62 pypi_0 pypi > opencv-python-headless 4.5.5.62 pypi_0 pypi > openh264 2.3.1 h63175ca_2 conda-forge > openjpeg 2.4.0 h4fc8c34_0 > openssl 3.0.9 h2bbff1b_0 > opt-einsum 3.3.0 pypi_0 pypi > packaging 21.3 pyhd3eb1b0_0 > pandas 1.3.5 py37h9386db6_0 conda-forge > pillow 8.4.0 py37hd7d9ad0_0 conda-forge > pip 23.1.2 pyhd8ed1ab_0 conda-forge > pkgutil-resolve-name 1.3.10 pypi_0 pypi > protobuf 4.22.1 pypi_0 pypi > psutil 5.9.4 pypi_0 pypi > pyasn1 0.4.8 pypi_0 pypi > pyasn1-modules 0.2.8 pypi_0 pypi > pygments 2.14.0 pypi_0 pypi > pykalman 0.9.5 pypi_0 pypi > pynwb 2.3.1 pypi_0 pypi > pyparsing 3.0.7 pypi_0 pypi > pyreadline 2.1 py37_1 > pyrsistent 0.19.3 pypi_0 pypi > pyside2 5.14.1 pypi_0 pypi > python 3.7.12 h900ac77_100_cpython conda-forge > python-dateutil 2.8.2 pyhd3eb1b0_0 > python-rapidjson 1.10 pypi_0 pypi > python_abi 3.7 3_cp37m conda-forge > pytz 2022.7 py37haa95532_0 > pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi > pywavelets 1.3.0 pypi_0 pypi > pyzmq 25.0.2 pypi_0 pypi > qimage2ndarray 1.9.0 pypi_0 pypi > qtpy 2.2.0 py37haa95532_0 > requests 2.27.1 pypi_0 pypi > requests-oauthlib 1.3.1 pypi_0 pypi > rich 10.16.1 pypi_0 pypi > ruamel-yaml 0.17.21 pypi_0 pypi > ruamel-yaml-clib 0.2.7 pypi_0 pypi > scikit-image 0.19.3 pypi_0 pypi > scikit-learn 1.0.2 pypi_0 pypi > scikit-video 1.1.11 pypi_0 pypi > scipy 1.7.3 py37hb6553fb_0 conda-forge > seaborn 0.12.2 pypi_0 pypi > segmentation-models 1.0.1 pypi_0 pypi > setuptools 59.8.0 py37h03978a9_1 conda-forge > setuptools-scm 6.3.2 pypi_0 pypi > shapely 1.7.1 py37hc520ffa_5 conda-forge > shiboken2 5.14.1 pypi_0 pypi > six 1.15.0 py37haa95532_0 > sleap 1.3.0 pypi_0 pypi > sqlite 3.41.2 h2bbff1b_0 > svt-av1 1.4.1 h63175ca_0 conda-forge > tbb 2021.8.0 h59b6b97_0 > tensorboard 2.6.0 pypi_0 pypi > tensorboard-data-server 0.6.1 pypi_0 pypi > tensorboard-plugin-wit 1.8.1 pypi_0 pypi > tensorflow 2.6.3 pypi_0 pypi > tensorflow-estimator 2.6.0 pypi_0 pypi > tensorflow-hub 0.13.0 pypi_0 pypi > termcolor 1.1.0 pypi_0 pypi > threadpoolctl 3.1.0 pypi_0 pypi > tifffile 2021.11.2 pypi_0 pypi > tk 8.6.12 h2bbff1b_0 > tomli 2.0.1 pypi_0 pypi > typing-extensions 3.10.0.2 pypi_0 pypi > tzdata 2022.7 pypi_0 pypi > tzlocal 4.3 pypi_0 pypi > ucrt 10.0.20348.0 haa95532_0 > urllib3 1.26.8 pypi_0 pypi > vc 14.2 h21ff451_1 > vc14_runtime 14.34.31931 h5081d32_16 conda-forge > vs2015_runtime 14.34.31931 hed1258a_16 conda-forge > werkzeug 2.0.3 pypi_0 pypi > wheel 0.38.4 py37haa95532_0 > wrapt 1.12.1 pypi_0 pypi > x264 1!164.3095 h8ffe710_2 conda-forge > x265 3.5 h2d74725_3 conda-forge > xz 5.2.6 h8d14728_0 conda-forge > zipp 3.7.0 pypi_0 pypi > zlib 1.2.13 hcfcfb64_5 conda-forge > zstd 1.5.2 h12be248_6 conda-forge
Logs ``` # paste relevant logs here, if any ``` > INFO:sleap.nn.training: > INFO:sleap.nn.training:Auto-selected GPU 0 with 16183 MiB of free memory. > INFO:sleap.nn.training:Using GPU 0 for acceleration. > INFO:sleap.nn.training:Disabled GPU memory pre-allocation. > INFO:sleap.nn.training:System: > GPUs: 1/1 available > Device: /physical_device:GPU:0 > Available: True > Initalized: False > Memory growth: True > INFO:sleap.nn.training: > INFO:sleap.nn.training:Initializing trainer... > INFO:sleap.nn.training:Loading training labels from: Z:/KuhnU/Miles-Kuhn/SLEAP/NewModel2.slp > INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 > INFO:sleap.nn.training: Splits: Training = 1416 / Validation = 157. > INFO:sleap.nn.training:Setting up for training... > INFO:sleap.nn.training:Setting up pipeline builders... > INFO:sleap.nn.training:Setting up model... > INFO:sleap.nn.training:Building test pipeline... > 2024-05-14 20:16:14.066762: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 > To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. > 2024-05-14 20:16:16.252042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13599 MB memory: -> device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:01:00.0, compute capability: 8.6 > 2024-05-14 20:16:19.767318: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) > INFO:sleap.nn.training:Loaded test example. [273.244s] > INFO:sleap.nn.training: Input shape: (432, 576, 1) > INFO:sleap.nn.training:Created Keras model. > INFO:sleap.nn.training: Backbone: ResNet152(upsampling_stack=UpsamplingStack(output_stride=2, upsampling_stride=2, transposed_conv=False, transposed_conv_filters=64, transposed_conv_filters_rate=1.0, transposed_conv_kernel_size=4, transposed_conv_batchnorm=True, make_skip_connection=False, skip_add=False, refine_convs=2, refine_convs_filters=64, refine_convs_filters_rate=1.0, refine_convs_batchnorm=True), features_output_stride=32, pretrained=True, frozen=True, skip_connections=False, model_name='resnet152', stack_configs=[{'filters': 64, 'blocks': 3, 'stride1': 1, 'name': 'conv2', 'dilation_rate': 1}, {'filters': 128, 'blocks': 8, 'stride1': 2, 'name': 'conv3', 'dilation_rate': 1}, {'filters': 256, 'blocks': 36, 'stride1': 2, 'name': 'conv4', 'dilation_rate': 1}, {'filters': 512, 'blocks': 3, 'stride1': 2, 'name': 'conv5', 'dilation_rate': 1}]) > INFO:sleap.nn.training: Max stride: 32 > INFO:sleap.nn.training: Parameters: 59,811,915 > INFO:sleap.nn.training: Heads: > INFO:sleap.nn.training: [0] = SingleInstanceConfmapsHead(part_names=['eyelid top', 'eyelid bottom', 'nose right', 'nose left', 'spout', 'mouth lip top', 'mouth corner', 'paw right', 'paw left', 'tongue', 'mouth lip bottom'], sigma=2.5, output_stride=2, loss_weight=1.0) > INFO:sleap.nn.training: Outputs: > INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 224, 288, 11), dtype=tf.float32, name=None), name='SingleInstanceConfmapsHead/BiasAdd:0', description="created by layer 'SingleInstanceConfmapsHead'") > INFO:sleap.nn.training:Training from scratch > INFO:sleap.nn.training:Setting up data pipelines... > INFO:sleap.nn.training:Training set: n = 1416 > INFO:sleap.nn.training:Validation set: n = 157 > INFO:sleap.nn.training:Setting up optimization... > INFO:sleap.nn.training: OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=2, max_hard_keypoints=None, loss_scale=5.0) > INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) > INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=15) > INFO:sleap.nn.training:Setting up outputs... > INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) > INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 > INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set > INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 > INFO:sleap.nn.training:Created run path: Z:/KuhnU/Miles-Kuhn/SLEAP\models\240514_201219.single_instance.n=1573 > INFO:sleap.nn.training:Setting up visualization... > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py:1177: UserWarning: Model input of shape (None, 432, 576, 1) does not divide evenly with output of shape (None, 224, 288, 11). > f"Model input of shape {model.inputs[input_ind].shape} does not divide " > INFO:sleap.nn.training:Finished trainer set up. [314.1s] > INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... > INFO:sleap.nn.training:Finished creating training datasets. [5796.8s] > INFO:sleap.nn.training:Starting training loop... > Epoch 1/500 > Traceback (most recent call last): > File "C:\Users\ONS\anaconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in > sys.exit(load_entry_point('sleap==1.3.0', 'console_scripts', 'sleap-train')()) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2014, in main > trainer.train() > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 943, in train > verbose=2, > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 1184, in fit > tmp_logs = self.train_function(iterator) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__ > result = self._call(*args, **kwds) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 933, in _call > self._initialize(args, kwds, add_initializers_to=initializers) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 760, in _initialize > *args, **kwds)) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\function.py", line 3066, in _get_concrete_function_internal_garbage_collected > graph_function, _ = self._maybe_define_function(args, kwargs) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\function.py", line 3463, in _maybe_define_function > graph_function = self._create_graph_function(args, kwargs) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\function.py", line 3308, in _create_graph_function > capture_by_value=self._capture_by_value), > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\func_graph.py", line 1007, in func_graph_from_py_func > func_outputs = python_func(*func_args, **func_kwargs) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 668, in wrapped_fn > out = weak_wrapped_fn().__wrapped__(*args, **kwds) > File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\func_graph.py", line 994, in wrapper > raise e.ag_error_metadata.to_exception(e) > ValueError: in user code: > > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py:853 train_function * > return step_function(self, iterator) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py:303 loss_fn * > loss += loss_fn(y_gt, y_pr) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\losses.py:141 __call__ ** > losses = call_fn(y_true, y_pred) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\losses.py:245 call ** > return ag_fn(y_true, y_pred, **self._fn_kwargs) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\util\dispatch.py:206 wrapper > return target(*args, **kwargs) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\losses.py:1204 mean_squared_error > return backend.mean(tf.math.squared_difference(y_pred, y_true), axis=-1) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\gen_math_ops.py:10514 squared_difference > "SquaredDifference", x=x, y=y, name=name) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\op_def_library.py:750 _apply_op_helper > attrs=attr_protos, op_def=op_def) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\func_graph.py:601 _create_op_internal > compute_device) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py:3569 _create_op_internal > op_def=op_def) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py:2042 __init__ > control_input_ops, op_def) > C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py:1883 _create_c_op > raise ValueError(str(e)) > > ValueError: Dimensions must be equal, but are 224 and 216 for '{{node loss_fn/mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](model/SingleInstanceConfmapsHead/BiasAdd, IteratorGetNext:1)' with input shapes: [15,224,288,11], [15,216,288,?]. > > INFO:sleap.nn.callbacks:Closing the reporter controller/context. > INFO:sleap.nn.callbacks:Closing the training controller socket/context. > Run Path: Z:/KuhnU/Miles-Kuhn/SLEAP\models\240514_201219.single_instance.n=1573

Screenshots

How to reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error
eberrigan commented 1 month ago

Hi @milesOIST,

Would you mind uploading a sleap package with your training data here so I can try replicating your issue?

Also, you mentioned certain settings. Which settings did you notice this error happening with?

Thanks!

Elizabeth

eberrigan commented 1 month ago

This could be related to #1768.