VincentCoulombe commented 11 months ago

Bug description

Hey everyone, I hope you're doing well. I ran into a problem when I was trying to move my DLC dataset to SLEAP. Specifically, I used this code to convert my DLC dataset to the slp format:

from sleap.io.format.main import read, write
filename = "/content/drive/MyDrive/DeepLabCutKeypointDataset/config.yaml"
labels = read(filename, "labels", as_format="deeplabcut")
labels.save("/content/drive/MyDrive/DeepLabCutKeypointDataset/sleap_formatted.slp")

After that, I attempted to start a training session using this command line:

!sleap-train pretrained.bottomup.json 
    "/content/drive/MyDrive/DeepLabCutKeypointDataset/sleap_formatted.slp"

I'm trying to figure out what went wrong. If you're still having trouble understanding the issue, I can share my config.yaml and .csv files.

Thanks a lot,

Vincent

Expected behaviour

The network should have trained like it did in the demo notebook.

Actual behaviour

Tensorflow raised a ValueError (see the log section for full traceback)

Your personal set up

OS:
COLAB (ubuntu)
Version(s):
sleap[pypi]>=1.3.3 Python==3.10.12

Environment packages

``` # paste output of `pip freeze` or `conda list` here ``` absl-py==1.4.0 aiohttp==3.8.6 aiosignal==1.3.1 alabaster==0.7.13 albumentations==1.3.1 altair==4.2.2 anyio==3.7.1 appdirs==1.4.4 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 array-record==0.5.0 arviz==0.15.1 astropy==5.3.4 astunparse==1.6.3 async-timeout==4.0.3 atpublic==4.0 attrs==21.4.0 audioread==3.0.1 autograd==1.6.2 Babel==2.13.1 backcall==0.2.0 beautifulsoup4==4.11.2 bidict==0.22.1 bigframes==0.13.0 bleach==6.1.0 blinker==1.4 blis==0.7.11 blosc2==2.0.0 bokeh==3.3.1 bqplot==0.12.42 branca==0.7.0 build==1.0.3 CacheControl==0.13.1 cachetools==5.3.2 catalogue==2.0.10 cattrs==1.1.1 certifi==2023.7.22 cffi==1.16.0 chardet==5.2.0 charset-normalizer==3.3.2 chex==0.1.7 click==8.1.7 click-plugins==1.1.1 cligj==0.7.2 cloudpickle==2.2.1 cmake==3.27.7 cmdstanpy==1.2.0 colorama==0.4.6 colorcet==3.0.1 colorlover==0.3.0 colour==0.1.5 commonmark==0.9.1 community==1.0.0b1 confection==0.1.3 cons==0.4.6 contextlib2==21.6.0 contourpy==1.2.0 cryptography==41.0.5 cufflinks==0.17.3 cupy-cuda11x==11.0.0 cvxopt==1.3.2 cvxpy==1.3.2 cycler==0.12.1 cymem==2.0.8 Cython==3.0.5 dask==2023.8.1 datascience==0.17.6 db-dtypes==1.1.1 dbus-python==1.2.18 debugpy==1.6.6 decorator==4.4.2 defusedxml==0.7.1 diskcache==5.6.3 distributed==2023.8.1 distro==1.7.0 dlib==19.24.2 dm-tree==0.1.8 docutils==0.18.1 dopamine-rl==4.0.6 duckdb==0.9.2 earthengine-api==0.1.379 easydict==1.11 ecos==2.0.12 editdistance==0.6.2 eerepr==0.0.4 efficientnet==1.0.0 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl#sha256=83276fc78a70045627144786b52e1f2728ad5e29e5e43916ec37ea9c26a11212 entrypoints==0.4 et-xmlfile==1.1.0 etils==1.5.2 etuples==0.3.9 exceptiongroup==1.1.3 fastai==2.7.13 fastcore==1.5.29 fastdownload==0.0.7 fastjsonschema==2.19.0 fastprogress==1.0.3 fastrlock==0.8.2 filelock==3.13.1 fiona==1.9.5 firebase-admin==5.3.0 Flask==2.2.5 flatbuffers==23.5.26 flax==0.7.5 folium==0.14.0 fonttools==4.44.3 frozendict==2.3.8 frozenlist==1.4.0 fsspec==2023.6.0 future==0.18.3 gast==0.5.4 gcsfs==2023.6.0 GDAL==3.4.3 gdown==4.6.6 geemap==0.28.2 gensim==4.3.2 geocoder==1.38.1 geographiclib==2.0 geopandas==0.13.2 geopy==2.3.0 gin-config==0.5.0 glob2==0.7 google==2.0.3 google-ai-generativelanguage==0.3.3 google-api-core==2.11.1 google-api-python-client==2.84.0 google-auth==2.17.3 google-auth-httplib2==0.1.1 google-auth-oauthlib==0.4.6 google-cloud-bigquery==3.12.0 google-cloud-bigquery-connection==1.12.1 google-cloud-bigquery-storage==2.22.0 google-cloud-core==2.3.3 google-cloud-datastore==2.15.2 google-cloud-firestore==2.11.1 google-cloud-functions==1.13.3 google-cloud-iam==2.12.2 google-cloud-language==2.9.1 google-cloud-resource-manager==1.10.4 google-cloud-storage==2.8.0 google-cloud-translate==3.11.3 google-colab @ file:///colabtools/dist/google-colab-1.0.0.tar.gz#sha256=5e13ebb2ac769f185d7b38358cd744c0953047984998b9050c34df732fa62e55 google-crc32c==1.5.0 google-generativeai==0.2.2 google-pasta==0.2.0 google-resumable-media==2.6.0 googleapis-common-protos==1.61.0 googledrivedownloader==0.4 graphviz==0.20.1 greenlet==3.0.1 grpc-google-iam-v1==0.12.7 grpcio==1.59.2 grpcio-status==1.48.2 gspread==3.4.2 gspread-dataframe==3.3.1 gym==0.25.2 gym-notices==0.0.8 h5netcdf==1.3.0 h5py==3.9.0 hdmf==3.11.0 holidays==0.36 holoviews==1.17.1 html5lib==1.1 httpimport==1.3.1 httplib2==0.22.0 huggingface-hub==0.19.4 humanize==4.7.0 hyperopt==0.2.7 ibis-framework==6.2.0 idna==3.4 image-classifiers==1.0.0 imageio==2.31.6 imageio-ffmpeg==0.4.9 imagesize==1.4.1 imbalanced-learn==0.10.1 imgaug==0.4.0 imgstore==0.2.9 importlib-metadata==6.8.0 importlib-resources==6.1.1 imutils==0.5.4 inflect==7.0.0 iniconfig==2.0.0 install==1.3.5 intel-openmp==2023.2.0 ipyevents==2.0.2 ipyfilechooser==0.6.0 ipykernel==5.5.6 ipyleaflet==0.17.4 ipython==7.34.0 ipython-genutils==0.2.0 ipython-sql==0.5.0 ipytree==0.2.2 ipywidgets==7.7.1 itsdangerous==2.1.2 jax==0.4.20 jaxlib @ https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.4.20+cuda11.cudnn86-cp310-cp310-manylinux2014_x86_64.whl#sha256=01be66238133f884bf5adf15cd7eaaf8445f9d4b056c5c64df28a997a6aff2fe jeepney==0.7.1 jieba==0.42.1 Jinja2==3.1.2 joblib==1.3.2 jsmin==3.0.1 jsonpickle==1.2 jsonschema==4.17.3 jsonschema-specifications==2023.11.1 jupyter-client==6.1.12 jupyter-console==6.1.0 jupyter-server==1.24.0 jupyter_core==5.5.0 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.9 kaggle==1.5.16 keras==2.8.0 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.2 keyring==23.5.0 kiwisolver==1.4.5 langcodes==3.3.0 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lazy_loader==0.3 libclang==16.0.6 librosa==0.10.1 lida==0.0.10 lightgbm==4.1.0 linkify-it-py==2.0.2 llmx==0.0.15a0 llvmlite==0.41.1 locket==1.0.0 logical-unification==0.4.6 lxml==4.9.3 malloy==2023.1064 Markdown==3.5.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 matplotlib==3.7.1 matplotlib-inline==0.1.6 matplotlib-venn==0.11.9 mdit-py-plugins==0.4.0 mdurl==0.1.2 miniKanren==1.0.3 missingno==0.5.2 mistune==0.8.4 mizani==0.9.3 mkl==2023.2.0 ml-dtypes==0.2.0 mlxtend==0.22.0 more-itertools==10.1.0 moviepy==1.0.3 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multipledispatch==1.0.0 multitasking==0.0.11 murmurhash==1.0.10 music21==9.1.0 natsort==8.4.0 nbclassic==1.0.0 nbclient==0.9.0 nbconvert==6.5.4 nbformat==5.9.2 ndx-pose==0.1.1 nest-asyncio==1.5.8 networkx==3.2.1 nibabel==4.0.2 nixio==1.5.3 nltk==3.8.1 notebook==6.5.5 notebook_shim==0.2.3 numba==0.58.1 numexpr==2.8.7 numpy==1.22.4 oauth2client==4.1.3 oauthlib==3.2.2 opencv-python==4.5.5.64 opencv-python-headless==4.8.1.78 openpyxl==3.1.2 opt-einsum==3.3.0 optax==0.1.7 orbax-checkpoint==0.4.2 osqp==0.6.2.post8 packaging==23.2 pandas==1.5.3 pandas-datareader==0.10.0 pandas-gbq==0.17.9 pandas-stubs==1.5.3.230304 pandocfilters==1.5.0 panel==1.3.1 param==2.0.1 parso==0.8.3 parsy==2.1 partd==1.4.1 pathlib==1.0.1 pathy==0.10.3 patsy==0.5.3 peewee==3.17.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==8.4.0 pip-tools==6.13.0 platformdirs==4.0.0 plotly==5.15.0 plotnine==0.12.4 pluggy==1.3.0 polars==0.17.3 pooch==1.8.0 portpicker==1.5.2 prefetch-generator==1.0.3 preshed==3.0.9 prettytable==3.9.0 proglog==0.1.10 progressbar2==4.2.0 prometheus-client==0.18.0 promise==2.3 prompt-toolkit==3.0.41 prophet==1.1.5 proto-plus==1.22.3 protobuf==3.19.6 psutil==5.9.5 psycopg2==2.9.9 ptyprocess==0.7.0 py-cpuinfo==9.0.0 py4j==0.10.9.7 pyarrow==9.0.0 pyasn1==0.5.0 pyasn1-modules==0.3.0 pycocotools==2.0.7 pycparser==2.21 pyct==0.5.0 pydantic==1.10.13 pydata-google-auth==1.8.2 pydot==1.4.2 pydot-ng==2.0.0 pydotplus==2.0.2 PyDrive==1.3.1 PyDrive2==1.6.3 pyerfa==2.0.1.1 pygame==2.5.2 Pygments==2.16.1 PyGObject==3.42.1 PyJWT==2.3.0 pykalman==0.9.5 pymc==5.7.2 pymystem3==0.2.0 pynwb==2.5.0 PyOpenGL==3.1.7 pyOpenSSL==23.3.0 pyparsing==3.1.1 pyperclip==1.8.2 pyproj==3.6.1 pyproject_hooks==1.0.0 pyrsistent==0.20.0 pyshp==2.3.1 PySide2==5.13.2 PySocks==1.7.1 pytensor==2.14.2 pytest==7.4.3 python-apt==0.0.0 python-box==7.1.1 python-dateutil==2.8.2 python-louvain==0.16 python-rapidjson==1.13 python-slugify==8.0.1 python-utils==3.8.1 pytz==2023.3.post1 pyviz_comms==3.0.0 PyWavelets==1.4.1 PyYAML==6.0.1 pyzmq==23.2.1 qdldl==0.1.7.post0 qimage2ndarray==1.10.0 QtPy==2.4.1 qudida==0.0.4 ratelim==0.1.6 referencing==0.31.0 regex==2023.6.3 requests==2.31.0 requests-oauthlib==1.3.1 requirements-parser==0.5.0 rich==10.16.1 rpds-py==0.13.0 rpy2==3.4.2 rsa==4.9 ruamel.yaml==0.18.5 ruamel.yaml.clib==0.2.8 safetensors==0.4.0 scikit-image==0.19.3 scikit-learn==1.0.2 scikit-video==1.1.11 scipy==1.9.0 scooby==0.9.2 scs==3.2.4 seaborn==0.12.2 SecretStorage==3.3.1 segmentation-models==1.0.1 Send2Trash==1.8.2 shapely==2.0.2 shiboken2==5.13.2 six==1.16.0 sklearn-pandas==2.2.0 sleap==1.3.3 smart-open==6.4.0 sniffio==1.3.0 snowballstemmer==2.2.0 sortedcontainers==2.4.0 soundfile==0.12.1 soupsieve==2.5 soxr==0.3.7 spacy==3.6.1 spacy-legacy==3.0.12 spacy-loggers==1.0.5 Sphinx==5.0.2 sphinxcontrib-applehelp==1.0.7 sphinxcontrib-devhelp==1.0.5 sphinxcontrib-htmlhelp==2.0.4 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.6 sphinxcontrib-serializinghtml==1.1.9 SQLAlchemy==2.0.23 sqlglot==17.16.2 sqlparse==0.4.4 srsly==2.4.8 stanio==0.3.0 statsmodels==0.14.0 sympy==1.12 tables==3.8.0 tabulate==0.9.0 tbb==2021.11.0 tblib==3.0.0 tenacity==8.2.3 tensorboard==2.8.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.8.4 tensorflow-datasets==4.9.3 tensorflow-estimator==2.8.0 tensorflow-gcs-config==2.14.0 tensorflow-hub==0.14.0 tensorflow-io-gcs-filesystem==0.34.0 tensorflow-metadata==1.14.0 tensorflow-probability==0.22.0 tensorstore==0.1.45 termcolor==2.3.0 terminado==0.18.0 text-unidecode==1.3 textblob==0.17.1 tf-slim==1.1.0 thinc==8.1.12 threadpoolctl==3.2.0 tifffile==2023.9.26 tinycss2==1.2.1 tokenizers==0.15.0 toml==0.10.2 tomli==2.0.1 toolz==0.12.0 torch @ https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=a81b554184492005543ddc32e96469f9369d778dedd195d73bda9bed407d6589 torchaudio @ https://download.pytorch.org/whl/cu118/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=cdfd0a129406155eee595f408cafbb92589652da4090d1d2040f5453d4cae71f torchdata==0.7.0 torchsummary==1.5.1 torchtext==0.16.0 torchvision @ https://download.pytorch.org/whl/cu118/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=033712f65d45afe806676c4129dfe601ad1321d9e092df62b15847c02d4061dc tornado==6.3.2 tqdm==4.66.1 traitlets==5.7.1 traittypes==0.2.1 transformers==4.35.2 triton==2.1.0 tweepy==4.14.0 typer==0.9.0 types-pytz==2023.3.1.1 types-setuptools==68.2.0.2 typing_extensions==4.5.0 tzlocal==5.2 uc-micro-py==1.0.2 uritemplate==4.1.1 urllib3==1.26.18 vega-datasets==0.9.0 wadllib==1.3.6 wasabi==1.1.2 wcwidth==0.2.10 webcolors==1.13 webencodings==0.5.1 websocket-client==1.6.4 Werkzeug==3.0.1 widgetsnbextension==3.6.6 wordcloud==1.9.2 wrapt==1.14.1 xarray==2023.7.0 xarray-einstats==0.6.0 xgboost==2.0.2 xlrd==2.0.1 xxhash==3.4.1 xyzservices==2023.10.1 yarl==1.9.2 yellowbrick==1.5 yfinance==0.2.31 zict==3.0.0 zipp==3.17.0

Logs

``` # paste relevant logs here, if any ``` INFO:numexpr.utils:NumExpr defaulting to 2 threads. INFO:sleap.nn.training:Versions: SLEAP: 1.3.3 TensorFlow: 2.8.4 Numpy: 1.22.4 Python: 3.10.12 OS: Linux-5.15.120+-x86_64-with-glibc2.35 INFO:sleap.nn.training:Training labels file: /content/drive/MyDrive/DeepLabCutKeypointDataset/sleap_formatted.slp INFO:sleap.nn.training:Training profile: /usr/local/lib/python3.10/dist-packages/sleap/training_profiles/pretrained.bottomup.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "pretrained.bottomup.json", "labels_path": "/content/drive/MyDrive/DeepLabCutKeypointDataset/sleap_formatted.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": false, "zmq": false, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 1.0, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": null, "hourglass": null, "resnet": null, "pretrained_encoder": { "encoder": "efficientnetb0", "pretrained": true, "decoder_filters": 256, "decoder_filters_rate": 1.0, "output_stride": 4, "decoder_batchnorm": true } }, "heads": { "single_instance": null, "centroid": null, "centered_instance": null, "multi_instance": { "confmaps": { "part_names": null, "sigma": 2.5, "output_stride": 4, "loss_weight": 1.0, "offset_refinement": false }, "pafs": { "edges": null, "sigma": 75.0, "output_stride": 8, "loss_weight": 1.0 } }, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": true }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-08, "plateau_patience": 8, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "pretrained.bottomup", "run_name_prefix": "", "run_name_suffix": null, "runs_folder": "models", "tags": [], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": false, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": false, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.3", "filename": "/usr/local/lib/python3.10/dist-packages/sleap/training_profiles/pretrained.bottomup.json" } INFO:sleap.nn.training: 2023-11-29 18:50:31.418406: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-11-29 18:50:31.418461: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) INFO:sleap.nn.training:Running in CPU-only mode. INFO:sleap.nn.training:System: GPUs: None detected. INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: /content/drive/MyDrive/DeepLabCutKeypointDataset/sleap_formatted.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 1385 / Validation = 154. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... INFO:sleap.nn.training:Loaded test example. [3.930s] INFO:sleap.nn.training: Input shape: (640, 640, 3) Downloading data from https://github.com/Callidior/keras-applications/releases/download/efficientnet/efficientnet-b0_weights_tf_dim_ordering_tf_kernels_autoaugment_notop.h5 16809984/16804768 [==============================] - 0s 0us/step 16818176/16804768 [==============================] - 0s 0us/step INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UnetPretrainedEncoder(encoder='efficientnetb0', decoder_filters=(256, 256, 256), pretrained=True) INFO:sleap.nn.training: Max stride: 32 INFO:sleap.nn.training: Parameters: 12,389,028 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = MultiInstanceConfmapsHead(part_names=['snout', 'rightear', 'rightleg', 'leftear', 'leftleg', 'centroid', 'tailbase', 'tag'], sigma=2.5, output_stride=4, loss_weight=1.0) INFO:sleap.nn.training: [1] = PartAffinityFieldsHead(edges=[], sigma=75.0, output_stride=8, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 160, 160, 8), dtype=tf.float32, name=None), name='MultiInstanceConfmapsHead/BiasAdd:0', description="created by layer 'MultiInstanceConfmapsHead'") INFO:sleap.nn.training: [1] = KerasTensor(type_spec=TensorSpec(shape=(None, 80, 80, 0), dtype=tf.float32, name=None), name='PartAffinityFieldsHead/BiasAdd:0', description="created by layer 'PartAffinityFieldsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 1385 INFO:sleap.nn.training:Validation set: n = 154 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-08, plateau_patience=8, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.training:Created run path: models/pretrained.bottomup INFO:sleap.nn.training:Setting up visualization... 2023-11-29 18:50:59.005748: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -33 } dim { size: -34 } dim { size: -35 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "111" frequency: 2199 num_cores: 2 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 57671680 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -36 } dim { size: -37 } dim { size: 1 } } } 2023-11-29 18:51:04.395473: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -33 } dim { size: -34 } dim { size: -35 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "111" frequency: 2199 num_cores: 2 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 57671680 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -36 } dim { size: -37 } dim { size: 1 } } } Unable to use Qt backend for matplotlib. This probably means Qt is running headless. Unable to use Qt backend for matplotlib. This probably means Qt is running headless. Unable to use Qt backend for matplotlib. This probably means Qt is running headless. Unable to use Qt backend for matplotlib. This probably means Qt is running headless. INFO:sleap.nn.training:Finished trainer set up. [17.9s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... Traceback (most recent call last): File "/usr/local/bin/sleap-train", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/sleap/nn/training.py", line 2014, in main trainer.train() File "/usr/local/lib/python3.10/dist-packages/sleap/nn/training.py", line 928, in train training_ds = self.training_pipeline.make_dataset() File "/usr/local/lib/python3.10/dist-packages/sleap/nn/data/pipelines.py", line 287, in make_dataset ds = transformer.transform_dataset(ds) File "/usr/local/lib/python3.10/dist-packages/sleap/nn/data/edge_maps.py", line 356, in transform_dataset output_ds = input_ds.map( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2018, in map return ParallelMapDataset( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 5234, in __init__ self._map_func = structured_function.StructuredFunctionWrapper( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/structured_function.py", line 271, in __init__ self._function = fn_factory() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/function.py", line 3070, in get_concrete_function graph_function = self._get_concrete_function_garbage_collected( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/function.py", line 3036, in _get_concrete_function_garbage_collected graph_function, _ = self._maybe_define_function(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/function.py", line 3130, in _create_graph_function func_graph_module.func_graph_from_py_func( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/structured_function.py", line 248, in wrapped_fn ret = wrapper_helper(*args) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/structured_function.py", line 177, in wrapper_helper ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/impl/api.py", line 692, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in user code: File "/usr/local/lib/python3.10/dist-packages/sleap/nn/data/edge_maps.py", line 334, in generate_pafs * edge_sources, edge_destinations = get_edge_points(instances, edge_inds) File "/usr/local/lib/python3.10/dist-packages/sleap/nn/data/edge_maps.py", line 233, in get_edge_points * source_inds = tf.cast(tf.gather(edge_inds, 0, axis=1), tf.int32) ValueError: Shape must be at least rank 2 but is rank 1 for '{{node GatherV2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_INT32, batch_dims=0](GatherV2/params, GatherV2/indices, GatherV2/axis)' with input shapes: [0], [], [] and with computed input tensors: input[2] = <1>.

Screenshots

How to reproduce

Go to '...'
Click on '....'
Scroll down to '....'
See error

talmo commented 11 months ago

Hi @VincentCoulombe,

Right, this is because the skeleton imported from the DLC dataset doesn't contain the edges, which are necessary for bottom up.

The same approach above will work with the top-down models, but if you want to try it out with a bottom-up model, my recommendation would be to open the SLP file you created in the GUI and add the skeleton edges from the right panel.

Let us know if that works for you!

In the future we should probably figure out how to import the DLC skeleton edges, but IIRC it's stored somewhere weird.

Cheers,

Talmo

VincentCoulombe commented 11 months ago

Hi @talmo

Thank you for the swift response. I confirm that the top-down approach did work.

Best regards,