talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
437 stars 97 forks source link

InvalidArgumentError: PyLong_AsSize_t failure error during sleap train ; ValueError during sleap-track #1998

Open rikebuck opened 1 month ago

rikebuck commented 1 month ago

Bug description

Hello, When running the sleap training remotely (nvidia v100 GPU or nvidia a10 GPU; linux redhat), ie:

labels="labels.v001_large_rf_grayscale.pkg.slp" config_json="baseline_large_rf.topdown.json" sleap-train "$config_json" "$labels"

the training works for the first 49/200 epochs, then I get the error:

Epoch 50/200 Traceback (most recent call last): File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/bin/sleap-train", line 33, in sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')()) File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main trainer.train() File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 941, in train verbose=2, File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3295, in _cache_key inputs, include_tensor_ranks_only, ENCODE_VARIABLES_BY_RESOURCE_ID) tensorflow.python.eager.core._NotOkStatusException: InvalidArgumentError: PyLong_AsSize_t failure

If I ignore the error and try to run inference using this model:

model1="/ru-auth/local/home/fbuck/scratch/SLEAP/models/baseline_large_rf.topdown_1/" input_video="{parent_dir}/SLEAP/ex_inference_vids/${vid_name}.mp4" model1_predictions="{parent_dir}/SLEAP/predictions/output_predictions_large_rf_topdown_1_50epochs${vid_name}.slp" sleap-track -m "$model1" -o "$model1_predictions" "$input_video"

I get the following error:

Traceback (most recent call last): File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/bin/sleap-track", line 33, in sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-track')()) File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 5424, in main labels_pr = predictor.predict(provider) File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 526, in predict self._make_labeled_frames_from_generator(generator, data) File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2633, in _make_labeled_frames_from_generator for ex in generator: File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 435, in _predict_generator for ex in self.pipeline.make_dataset(): File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/data/pipelines.py", line 276, in make_dataset self.validate_pipeline() File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/data/pipelines.py", line 252, in validate_pipeline f"Missing required keys for transformer (index = {i}, " ValueError: Missing required keys for transformer (index = 2, type = <class 'sleap.nn.data.instance_centroids.InstanceCentroidFinder'>): ['instances']. Available: ['frame_ind', 'offset_x', 'raw_image_size', 'image', 'scale', 'video_ind', 'offset_y']

Please advise. Thank you

Expected behaviour

trained model and prediction on input video

Actual behaviour

Please see above

Your personal set up

Environment packages ``` # paste output of `pip freeze` or `conda list` here ``` # packages in environment at /rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge absl-py 1.0.0 pypi_0 pypi alsa-lib 1.2.3.2 h166bdaf_0 conda-forge astunparse 1.6.3 pypi_0 pypi attrs 21.4.0 pyhd8ed1ab_0 conda-forge backports-zoneinfo 0.2.1 pypi_0 pypi blas 1.1 openblas conda-forge brotli 1.1.0 hb9d3cd8_2 conda-forge brotli-bin 1.1.0 hb9d3cd8_2 conda-forge bzip2 1.0.8 h4bc722e_7 conda-forge c-ares 1.32.3 h4bc722e_0 conda-forge ca-certificates 2024.8.30 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 4.2.4 pypi_0 pypi cairo 1.16.0 h6cf1ce9_1008 conda-forge cattrs 1.1.1 pyhd8ed1ab_0 conda-forge certifi 2024.8.30 pyhd8ed1ab_0 conda-forge charset-normalizer 2.0.9 pypi_0 pypi cloudpickle 2.2.1 pyhd8ed1ab_0 conda-forge cuda-nvcc 11.3.58 h2467b9f_0 nvidia cudatoolkit 11.3.1 hb98b00a_13 conda-forge cudnn 8.2.1.32 h86fa8c9_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cytoolz 0.12.0 py37h540881e_0 conda-forge dask-core 2022.2.0 pyhd8ed1ab_0 conda-forge dbus 1.13.6 h5008d03_3 conda-forge efficientnet 1.0.0 pypi_0 pypi expat 2.6.3 h5888daf_0 conda-forge ffmpeg 4.3.2 h37c90e5_3 conda-forge fftw 3.3.10 nompi_hf1063bd_110 conda-forge flatbuffers 2.0 pypi_0 pypi fontconfig 2.14.2 h14ed4e7_0 conda-forge fonttools 4.38.0 py37h540881e_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge fsspec 2023.1.0 pyhd8ed1ab_0 conda-forge gast 0.4.0 pypi_0 pypi geos 3.11.0 h27087fc_0 conda-forge gettext 0.22.5 he02047a_3 conda-forge gettext-tools 0.22.5 he02047a_3 conda-forge gmp 6.3.0 hac33072_2 conda-forge gnutls 3.6.13 h85f3911_1 conda-forge google-auth 2.3.3 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi graphite2 1.3.13 h59595ed_1003 conda-forge grpcio 1.43.0 pypi_0 pypi gst-plugins-base 1.18.5 hf529b03_3 conda-forge gstreamer 1.18.5 h9f60fe5_3 conda-forge h5py 3.1.0 nompi_py37h1e651dc_100 conda-forge harfbuzz 2.9.1 h83ec7ef_1 conda-forge hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge icu 68.2 h9c3ff4c_0 conda-forge idna 3.3 pypi_0 pypi image-classifiers 1.0.0 pypi_0 pypi imagecodecs-lite 2019.12.3 py37hc105733_5 conda-forge imageio 2.35.1 pyh12aca89_0 conda-forge imgaug 0.4.0 pyhd8ed1ab_1 conda-forge imgstore 0.2.9 pypi_0 pypi importlib-metadata 4.2.0 pypi_0 pypi importlib-resources 5.12.0 pypi_0 pypi jasper 1.900.1 h07fcdf6_1006 conda-forge joblib 1.3.2 pyhd8ed1ab_0 conda-forge jpeg 9e h0b41bf4_3 conda-forge jsmin 3.0.1 pyhd8ed1ab_0 conda-forge jsonpickle 1.2 py_0 conda-forge jsonschema 4.17.3 pypi_0 pypi keras 2.7.0 pypi_0 pypi keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 py37h7cecad7_0 conda-forge krb5 1.19.3 h3790be6_0 conda-forge lame 3.100 h166bdaf_1003 conda-forge lcms2 2.14 h6ed2654_0 conda-forge ld_impl_linux-64 2.43 h712a8e2_1 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libasprintf 0.22.5 he8f35ee_3 conda-forge libasprintf-devel 0.22.5 he8f35ee_3 conda-forge libblas 3.9.0 24_linux64_openblas conda-forge libbrotlicommon 1.1.0 hb9d3cd8_2 conda-forge libbrotlidec 1.1.0 hb9d3cd8_2 conda-forge libbrotlienc 1.1.0 hb9d3cd8_2 conda-forge libcblas 3.9.0 24_linux64_openblas conda-forge libclang 12.0.0 pypi_0 pypi libcurl 7.86.0 h7bff187_1 conda-forge libdeflate 1.14 h166bdaf_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libexpat 2.6.3 h5888daf_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc 14.1.0 h77fa898_1 conda-forge libgcc-ng 14.1.0 h69a702a_1 conda-forge libgettextpo 0.22.5 he02047a_3 conda-forge libgettextpo-devel 0.22.5 he02047a_3 conda-forge libgfortran 14.1.0 h69a702a_1 conda-forge libgfortran-ng 14.1.0 h69a702a_1 conda-forge libgfortran5 14.1.0 hc5f4f2c_1 conda-forge libglib 2.80.2 hf974151_0 conda-forge libgomp 14.1.0 h77fa898_1 conda-forge libiconv 1.17 hd590300_2 conda-forge liblapack 3.9.0 24_linux64_openblas conda-forge liblapacke 3.9.0 24_linux64_openblas conda-forge libllvm11 11.1.0 he0ac6c6_5 conda-forge libnghttp2 1.51.0 hdcd2b5c_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libogg 1.3.5 h4ab18f5_0 conda-forge libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge libopencv 4.5.3 py37h25009ff_1 conda-forge libopus 1.3.1 h7f98852_1 conda-forge libpng 1.6.43 h2797004_0 conda-forge libpq 13.8 hd77ab85_0 conda-forge libprotobuf 3.16.0 h780b84a_0 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libsqlite 3.46.0 hde9e2c9_0 conda-forge libssh2 1.10.0 haa6b8db_3 conda-forge libstdcxx 14.1.0 hc0a3c3a_1 conda-forge libstdcxx-ng 14.1.0 h4852527_1 conda-forge libtiff 4.4.0 h82bc61c_5 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libvorbis 1.3.7 h9c3ff4c_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxkbcommon 1.0.3 he3ba5ed_0 conda-forge libxml2 2.9.12 h72842e0_0 conda-forge libxslt 1.1.33 h15afd5d_2 conda-forge libzlib 1.2.13 h4ab18f5_6 conda-forge locket 1.0.0 pyhd8ed1ab_0 conda-forge markdown 3.3.6 pypi_0 pypi markdown-it-py 2.2.0 pyhd8ed1ab_0 conda-forge matplotlib-base 3.5.3 py37hf395dca_2 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mysql-common 8.0.32 h14678bc_0 conda-forge mysql-libs 8.0.32 h54cf53e_0 conda-forge ncurses 6.5 he02047a_1 conda-forge ndx-pose 0.1.1 pypi_0 pypi nettle 3.6 he412f7d_0 conda-forge networkx 2.7 pyhd8ed1ab_0 conda-forge nixio 1.5.3 pypi_0 pypi nspr 4.35 h27087fc_0 conda-forge nss 3.100 hca3bf56_0 conda-forge numpy 1.19.5 pypi_0 pypi oauthlib 3.1.1 pypi_0 pypi openblas 0.3.27 pthreads_h9eca1d5_1 conda-forge opencv 4.5.3 py37h89c1867_1 conda-forge opencv-python-headless 4.2.0.34 pypi_0 pypi openh264 2.1.1 h780b84a_0 conda-forge openjpeg 2.5.0 h7d73246_1 conda-forge openssl 1.1.1w hd590300_0 conda-forge opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pandas 1.3.5 py37he8f5f7f_0 conda-forge partd 1.4.1 pyhd8ed1ab_0 conda-forge patsy 0.5.6 pyhd8ed1ab_0 conda-forge pcre2 10.43 hcad00b1_0 conda-forge pillow 9.2.0 py37h850a105_2 conda-forge pip 24.0 pyhd8ed1ab_0 conda-forge pixman 0.43.2 h59595ed_0 conda-forge pkgutil-resolve-name 1.3.10 pypi_0 pypi protobuf 3.19.1 pypi_0 pypi psutil 5.9.3 py37h540881e_0 conda-forge pthread-stubs 0.4 hb9d3cd8_1002 conda-forge py-opencv 4.5.3 py37h6531663_1 conda-forge pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pygments 2.17.2 pyhd8ed1ab_0 conda-forge pykalman 0.9.7 pyhd8ed1ab_0 conda-forge pynwb 2.3.3 pypi_0 pypi pyparsing 3.0.6 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi pyside2 5.13.2 py37hfa98aef_7 conda-forge python 3.7.12 hb7a2778_100_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-rapidjson 1.9 py37hd23a5d3_0 conda-forge python_abi 3.7 4_cp37m conda-forge pytz 2024.2 pyhd8ed1ab_0 conda-forge pywavelets 1.3.0 py37hda87dfa_1 conda-forge pyyaml 6.0 py37h540881e_4 conda-forge pyzmq 24.0.1 py37h0c0c2a8_0 conda-forge qimage2ndarray 1.10.0 pypi_0 pypi qt 5.12.9 hda022c4_4 conda-forge qtpy 2.4.1 pyhd8ed1ab_0 conda-forge readline 8.2 h8228510_1 conda-forge requests 2.26.0 pypi_0 pypi requests-oauthlib 1.3.0 pypi_0 pypi rich 13.8.1 pyhd8ed1ab_0 conda-forge ruamel-yaml 0.17.32 pypi_0 pypi ruamel-yaml-clib 0.2.7 pypi_0 pypi scikit-image 0.19.2 py37he8f5f7f_0 conda-forge scikit-learn 1.0 py37hf0f1638_1 conda-forge scikit-video 1.1.11 pyh24bf2e0_0 conda-forge scipy 1.7.3 py37hf838250_2 anaconda seaborn 0.12.2 hd8ed1ab_0 conda-forge seaborn-base 0.12.2 pyhd8ed1ab_0 conda-forge segmentation-models 1.0.1 pypi_0 pypi setuptools 59.8.0 py37h89c1867_1 conda-forge setuptools-scm 6.3.2 pypi_0 pypi shapely 1.8.5 py37ha4e3bd1_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sleap 1.3.3 pypi_0 pypi sqlite 3.46.0 h6d4b2fc_0 conda-forge statsmodels 0.13.2 py37hda87dfa_0 conda-forge tensorboard 2.7.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.0 pypi_0 pypi tensorflow 2.7.0 pypi_0 pypi tensorflow-estimator 2.7.0 pypi_0 pypi tensorflow-hub 0.13.0 pyh56297ac_0 conda-forge tensorflow-io-gcs-filesystem 0.23.1 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge tifffile 2020.6.3 py_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.0 pypi_0 pypi toolz 0.12.1 pyhd8ed1ab_0 conda-forge typing-extensions 4.0.1 pypi_0 pypi typing_extensions 4.7.1 pyha770c72_0 conda-forge tzlocal 5.0.1 pypi_0 pypi unicodedata2 14.0.0 py37h540881e_1 conda-forge urllib3 1.26.7 pypi_0 pypi werkzeug 2.0.2 pypi_0 pypi wheel 0.42.0 pyhd8ed1ab_0 conda-forge wrapt 1.13.3 pypi_0 pypi x264 1!161.3030 h7f98852_1 conda-forge xorg-kbproto 1.0.7 hb9d3cd8_1003 conda-forge xorg-libice 1.1.1 hb9d3cd8_1 conda-forge xorg-libsm 1.2.4 he73a12e_1 conda-forge xorg-libx11 1.8.4 h0b41bf4_0 conda-forge xorg-libxau 1.0.11 hb9d3cd8_1 conda-forge xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 hb9d3cd8_1003 conda-forge xorg-xextproto 7.3.0 hb9d3cd8_1004 conda-forge xorg-xproto 7.0.31 hb9d3cd8_1008 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zeromq 4.3.5 h59595ed_1 conda-forge zipp 3.6.0 pypi_0 pypi zlib 1.2.13 h4ab18f5_6 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge
Logs ``` # paste relevant logs here, if any ``` full txt files of the slurm output are attached below. [sleap-train_output.txt](https://github.com/user-attachments/files/17397921/sleap-train_output.txt) [sleap-track_output.txt](https://github.com/user-attachments/files/17397902/sleap-track_output.txt) ## Screenshots ## How to reproduce 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error
eberrigan commented 1 month ago

Hi @rikebuck,

How did you install SLEAP? It looks like you have tensorflow 2.12 but our conda package is tensorflow 2.7 for Windows and Linux.

Here is an image with SLEAP and its dependencies installed. This should work on your cluster. Please follow the directions in the README to use it. https://gitlab.com/salk-tm/sleap-train

Best,

Elizabeth

rikebuck commented 1 month ago

I installed SLEAP by using the command "conda create -y -n sleap -c conda-forge -c nvidia -c sleap -c anaconda sleap=1.3.3"

Thank you for the image. In the instructions say "Make sure to have Docker Desktop running first" , however I am running sleap-train remotely. Would I install docker using these instructions instead: https://docs.docker.com/engine/install/rhel/ ? It is a shared cluster and I do not have permission to run "sudo" but I could reach out to the managers of the cluster if installing docker this way makes sense.

In the mean time, I can look into trying to change the tensorflow version in my sleap conda environment.

Thank you

rikebuck commented 1 month ago

Wait, Actually I just ran "conda list" on my sleap conda environment, and it looks like my tensorflow version is correct -- it is 2.7.0 (please see below) Sorry for the mistake earlier, I edited my initial issue post above to reflect this. However, the issue remains, and sleap-train was ran in this correct conda environment, using tensorflow 2.7 as confirmed in the sleap-train output log above :

"INFO:sleap.nn.training:Versions: SLEAP: 1.3.3 TensorFlow: 2.7.0 Numpy: 1.19.5 Python: 3.7.12" ...

please advise. Thank you.

packages in environment at /rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap:

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge absl-py 1.0.0 pypi_0 pypi alsa-lib 1.2.3.2 h166bdaf_0 conda-forge astunparse 1.6.3 pypi_0 pypi attrs 21.4.0 pyhd8ed1ab_0 conda-forge backports-zoneinfo 0.2.1 pypi_0 pypi blas 1.1 openblas conda-forge brotli 1.1.0 hb9d3cd8_2 conda-forge brotli-bin 1.1.0 hb9d3cd8_2 conda-forge bzip2 1.0.8 h4bc722e_7 conda-forge c-ares 1.32.3 h4bc722e_0 conda-forge ca-certificates 2024.8.30 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 4.2.4 pypi_0 pypi cairo 1.16.0 h6cf1ce9_1008 conda-forge cattrs 1.1.1 pyhd8ed1ab_0 conda-forge certifi 2024.8.30 pyhd8ed1ab_0 conda-forge charset-normalizer 2.0.9 pypi_0 pypi cloudpickle 2.2.1 pyhd8ed1ab_0 conda-forge cuda-nvcc 11.3.58 h2467b9f_0 nvidia cudatoolkit 11.3.1 hb98b00a_13 conda-forge cudnn 8.2.1.32 h86fa8c9_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cytoolz 0.12.0 py37h540881e_0 conda-forge dask-core 2022.2.0 pyhd8ed1ab_0 conda-forge dbus 1.13.6 h5008d03_3 conda-forge efficientnet 1.0.0 pypi_0 pypi expat 2.6.3 h5888daf_0 conda-forge ffmpeg 4.3.2 h37c90e5_3 conda-forge fftw 3.3.10 nompi_hf1063bd_110 conda-forge flatbuffers 2.0 pypi_0 pypi fontconfig 2.14.2 h14ed4e7_0 conda-forge fonttools 4.38.0 py37h540881e_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge fsspec 2023.1.0 pyhd8ed1ab_0 conda-forge gast 0.4.0 pypi_0 pypi geos 3.11.0 h27087fc_0 conda-forge gettext 0.22.5 he02047a_3 conda-forge gettext-tools 0.22.5 he02047a_3 conda-forge gmp 6.3.0 hac33072_2 conda-forge gnutls 3.6.13 h85f3911_1 conda-forge google-auth 2.3.3 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi graphite2 1.3.13 h59595ed_1003 conda-forge grpcio 1.43.0 pypi_0 pypi gst-plugins-base 1.18.5 hf529b03_3 conda-forge gstreamer 1.18.5 h9f60fe5_3 conda-forge h5py 3.1.0 nompi_py37h1e651dc_100 conda-forge harfbuzz 2.9.1 h83ec7ef_1 conda-forge hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge icu 68.2 h9c3ff4c_0 conda-forge idna 3.3 pypi_0 pypi image-classifiers 1.0.0 pypi_0 pypi imagecodecs-lite 2019.12.3 py37hc105733_5 conda-forge imageio 2.35.1 pyh12aca89_0 conda-forge imgaug 0.4.0 pyhd8ed1ab_1 conda-forge imgstore 0.2.9 pypi_0 pypi importlib-metadata 4.2.0 pypi_0 pypi importlib-resources 5.12.0 pypi_0 pypi jasper 1.900.1 h07fcdf6_1006 conda-forge joblib 1.3.2 pyhd8ed1ab_0 conda-forge jpeg 9e h0b41bf4_3 conda-forge jsmin 3.0.1 pyhd8ed1ab_0 conda-forge jsonpickle 1.2 py_0 conda-forge jsonschema 4.17.3 pypi_0 pypi keras 2.7.0 pypi_0 pypi keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 py37h7cecad7_0 conda-forge krb5 1.19.3 h3790be6_0 conda-forge lame 3.100 h166bdaf_1003 conda-forge lcms2 2.14 h6ed2654_0 conda-forge ld_impl_linux-64 2.43 h712a8e2_1 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libasprintf 0.22.5 he8f35ee_3 conda-forge libasprintf-devel 0.22.5 he8f35ee_3 conda-forge libblas 3.9.0 24_linux64_openblas conda-forge libbrotlicommon 1.1.0 hb9d3cd8_2 conda-forge libbrotlidec 1.1.0 hb9d3cd8_2 conda-forge libbrotlienc 1.1.0 hb9d3cd8_2 conda-forge libcblas 3.9.0 24_linux64_openblas conda-forge libclang 12.0.0 pypi_0 pypi libcurl 7.86.0 h7bff187_1 conda-forge libdeflate 1.14 h166bdaf_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libexpat 2.6.3 h5888daf_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc 14.1.0 h77fa898_1 conda-forge libgcc-ng 14.1.0 h69a702a_1 conda-forge libgettextpo 0.22.5 he02047a_3 conda-forge libgettextpo-devel 0.22.5 he02047a_3 conda-forge libgfortran 14.1.0 h69a702a_1 conda-forge libgfortran-ng 14.1.0 h69a702a_1 conda-forge libgfortran5 14.1.0 hc5f4f2c_1 conda-forge libglib 2.80.2 hf974151_0 conda-forge libgomp 14.1.0 h77fa898_1 conda-forge libiconv 1.17 hd590300_2 conda-forge liblapack 3.9.0 24_linux64_openblas conda-forge liblapacke 3.9.0 24_linux64_openblas conda-forge libllvm11 11.1.0 he0ac6c6_5 conda-forge libnghttp2 1.51.0 hdcd2b5c_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libogg 1.3.5 h4ab18f5_0 conda-forge libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge libopencv 4.5.3 py37h25009ff_1 conda-forge libopus 1.3.1 h7f98852_1 conda-forge libpng 1.6.43 h2797004_0 conda-forge libpq 13.8 hd77ab85_0 conda-forge libprotobuf 3.16.0 h780b84a_0 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libsqlite 3.46.0 hde9e2c9_0 conda-forge libssh2 1.10.0 haa6b8db_3 conda-forge libstdcxx 14.1.0 hc0a3c3a_1 conda-forge libstdcxx-ng 14.1.0 h4852527_1 conda-forge libtiff 4.4.0 h82bc61c_5 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libvorbis 1.3.7 h9c3ff4c_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxkbcommon 1.0.3 he3ba5ed_0 conda-forge libxml2 2.9.12 h72842e0_0 conda-forge libxslt 1.1.33 h15afd5d_2 conda-forge libzlib 1.2.13 h4ab18f5_6 conda-forge locket 1.0.0 pyhd8ed1ab_0 conda-forge markdown 3.3.6 pypi_0 pypi markdown-it-py 2.2.0 pyhd8ed1ab_0 conda-forge matplotlib-base 3.5.3 py37hf395dca_2 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mysql-common 8.0.32 h14678bc_0 conda-forge mysql-libs 8.0.32 h54cf53e_0 conda-forge ncurses 6.5 he02047a_1 conda-forge ndx-pose 0.1.1 pypi_0 pypi nettle 3.6 he412f7d_0 conda-forge networkx 2.7 pyhd8ed1ab_0 conda-forge nixio 1.5.3 pypi_0 pypi nspr 4.35 h27087fc_0 conda-forge nss 3.100 hca3bf56_0 conda-forge numpy 1.19.5 pypi_0 pypi oauthlib 3.1.1 pypi_0 pypi openblas 0.3.27 pthreads_h9eca1d5_1 conda-forge opencv 4.5.3 py37h89c1867_1 conda-forge opencv-python-headless 4.2.0.34 pypi_0 pypi openh264 2.1.1 h780b84a_0 conda-forge openjpeg 2.5.0 h7d73246_1 conda-forge openssl 1.1.1w hd590300_0 conda-forge opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pandas 1.3.5 py37he8f5f7f_0 conda-forge partd 1.4.1 pyhd8ed1ab_0 conda-forge patsy 0.5.6 pyhd8ed1ab_0 conda-forge pcre2 10.43 hcad00b1_0 conda-forge pillow 9.2.0 py37h850a105_2 conda-forge pip 24.0 pyhd8ed1ab_0 conda-forge pixman 0.43.2 h59595ed_0 conda-forge pkgutil-resolve-name 1.3.10 pypi_0 pypi protobuf 3.19.1 pypi_0 pypi psutil 5.9.3 py37h540881e_0 conda-forge pthread-stubs 0.4 hb9d3cd8_1002 conda-forge py-opencv 4.5.3 py37h6531663_1 conda-forge pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pygments 2.17.2 pyhd8ed1ab_0 conda-forge pykalman 0.9.7 pyhd8ed1ab_0 conda-forge pynwb 2.3.3 pypi_0 pypi pyparsing 3.0.6 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi pyside2 5.13.2 py37hfa98aef_7 conda-forge python 3.7.12 hb7a2778_100_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-rapidjson 1.9 py37hd23a5d3_0 conda-forge python_abi 3.7 4_cp37m conda-forge pytz 2024.2 pyhd8ed1ab_0 conda-forge pywavelets 1.3.0 py37hda87dfa_1 conda-forge pyyaml 6.0 py37h540881e_4 conda-forge pyzmq 24.0.1 py37h0c0c2a8_0 conda-forge qimage2ndarray 1.10.0 pypi_0 pypi qt 5.12.9 hda022c4_4 conda-forge qtpy 2.4.1 pyhd8ed1ab_0 conda-forge readline 8.2 h8228510_1 conda-forge requests 2.26.0 pypi_0 pypi requests-oauthlib 1.3.0 pypi_0 pypi rich 13.8.1 pyhd8ed1ab_0 conda-forge ruamel-yaml 0.17.32 pypi_0 pypi ruamel-yaml-clib 0.2.7 pypi_0 pypi scikit-image 0.19.2 py37he8f5f7f_0 conda-forge scikit-learn 1.0 py37hf0f1638_1 conda-forge scikit-video 1.1.11 pyh24bf2e0_0 conda-forge scipy 1.7.3 py37hf838250_2 anaconda seaborn 0.12.2 hd8ed1ab_0 conda-forge seaborn-base 0.12.2 pyhd8ed1ab_0 conda-forge segmentation-models 1.0.1 pypi_0 pypi setuptools 59.8.0 py37h89c1867_1 conda-forge setuptools-scm 6.3.2 pypi_0 pypi shapely 1.8.5 py37ha4e3bd1_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sleap 1.3.3 pypi_0 pypi sqlite 3.46.0 h6d4b2fc_0 conda-forge statsmodels 0.13.2 py37hda87dfa_0 conda-forge tensorboard 2.7.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.0 pypi_0 pypi tensorflow 2.7.0 pypi_0 pypi tensorflow-estimator 2.7.0 pypi_0 pypi tensorflow-hub 0.13.0 pyh56297ac_0 conda-forge tensorflow-io-gcs-filesystem 0.23.1 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge tifffile 2020.6.3 py_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.0 pypi_0 pypi toolz 0.12.1 pyhd8ed1ab_0 conda-forge typing-extensions 4.0.1 pypi_0 pypi typing_extensions 4.7.1 pyha770c72_0 conda-forge tzlocal 5.0.1 pypi_0 pypi unicodedata2 14.0.0 py37h540881e_1 conda-forge urllib3 1.26.7 pypi_0 pypi werkzeug 2.0.2 pypi_0 pypi wheel 0.42.0 pyhd8ed1ab_0 conda-forge wrapt 1.13.3 pypi_0 pypi x264 1!161.3030 h7f98852_1 conda-forge xorg-kbproto 1.0.7 hb9d3cd8_1003 conda-forge xorg-libice 1.1.1 hb9d3cd8_1 conda-forge xorg-libsm 1.2.4 he73a12e_1 conda-forge xorg-libx11 1.8.4 h0b41bf4_0 conda-forge xorg-libxau 1.0.11 hb9d3cd8_1 conda-forge xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 hb9d3cd8_1003 conda-forge xorg-xextproto 7.3.0 hb9d3cd8_1004 conda-forge xorg-xproto 7.0.31 hb9d3cd8_1008 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zeromq 4.3.5 h59595ed_1 conda-forge zipp 3.6.0 pypi_0 pypi zlib 1.2.13 h4ab18f5_6 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge

"

eberrigan commented 1 month ago

Hi @rikebuck,

You should speak with your cluster managers about the best way to use a container on your cluster. Usually there is singularity or docker already installed. You might have a specific type of workload orchestrator.

Can you also provide the training hyperparameters used from the config file?

Is there a chance you are running out of memory while training?

Best,

Elizabeth

rikebuck commented 1 month ago

Okay I can reach out to them.

Attached please find the training jsons I have tried. The error for sleap-train only occurs at the 50th epoch. So if I take the model that was saved leading up to this point and run the 1 epoch json, there are no errors in the sleap-train step, however I get the same error listed when I run this on the in the sleap-track step.

I do not think I am running of memory: ( but please let me know if you think otherwise; or if there are other ways to test)

thank you,

baseline_large_rf.topdown_1_epoch.json baseline_large_rf.topdown.json baseline_medium_rf.topdown.json

eberrigan commented 1 month ago

Inference shouldn't work at this point since you are not able to localize instances https://sleap.ai/tutorials/initial-training.html. It sounds like you are making it to the 50th epoch of the centroid model, which means you are not progressing to the centered instance model. Both models are required to run inference with the top-down model.

Could you provide some screenshots of your data and labels? You can also send us your training package at this link to troubleshoot.

I think you might need to decrease the size of your input scaling in your centroid model.

rikebuck commented 1 month ago

Attached please find some example screenshots of my data. I have two different label packages one with 100pts/frame another with 16pts/frame. The error I posted this issue on was for the 100pts/frame version, although I am realizing I never ran training the 16 pt version on several epochs, and I will try this now.

I just uploaded an example labels package to the google form link provided. I uploaded the labels for the 16pt version, because the labels pkg for the 100pt version is >10gb (11.13gb).

Some notes:

thank you

Image Image Image Image

eberrigan commented 1 month ago

cool!

I see. You have a lot of nodes in your skeleton. Increasing the complexity of the skeleton make pose estimation more challenging https://sleap.ai/guides/skeletons.html#skeletons. You can experiment with the number of nodes in your skeleton.

You should try using the bottom-up approach which matches nodes using part-affinity fields. We have had good success with this approach in plants https://spj.science.org/doi/10.34133/plantphenomics.0175.

You will want to optimize your hyperparameters. You can try doing a hyperparameter search to find the hyperparameters that give you the highest accuracy.

In general, you want your receptive field size to be about the size of your animal. It looks like your features are pretty small, so you will probably just want to use the most computationally expensive hyperparameters. This means input scaling ~1.0 and increasing the number of filters. Please see the documentation https://sleap.ai/guides/choosing-models.html#choosing-models.

eberrigan commented 1 month ago

You can also take a look at the suggestions here #1977

rikebuck commented 1 month ago

ok! I can try the bottom up approach. How would I perform a hyperparameter search? Are there specific tools?

Do you have a sense of a good place to start with number of filters (or what is a low vs high number of filters?) Is that part of the hyperparameter search?

How do stride and input scaling related to receptive field size? I read https://distill.pub/2019/computing-receptive-fields/ as linked in the docs, but I'm not sure I understand how to compute the correct stride yet. I can use an input scaling of ~1.
The bounding box of my worms tends to be 55x25 px - 30x30px within a 120x120px image depending on the posture.

I can look into online mining as well , as you mentioned in the linked post.

I also wanted to mention. Historically, the frames that are most difficult to get the midline of the worm are ones where the worm is intersecting itself (please see frames below); this is much less frequent, but occurs regularly. Is there any reason to believe that this would be harder to infer?

thank you!

Image Image Image