talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
425 stars 97 forks source link

Predicting on the whole video after training drops labeled frames #357

Closed talmo closed 4 years ago

talmo commented 4 years ago

Steps to reproduce:

  1. Label N frames
  2. Run Training -> select Entire video as the prediction target
  3. Inference runs (with tracking) on every frame of the current video
  4. User-labeled instances on frames where inference (or tracking?) fails will be dropped, reducing the number of user-labeled frames in the video

Maybe this happens because of the tracking? Running inference with the same model(s) with tracking but through the Run Inference... GUI instead of as part of the training does not result in this behavior.

ntabris commented 4 years ago

I'm unable to replicate. I tried with both bottom-up models and top-down models. In both cases, the user labeled instances are all there, even on frames where there were no predictions.

@talmo, you say inference runs with tracking. The "Run Training" GUI doesn't give you the option to run tracking and I've never seen inference run with tracking after training. Can you give me more details about how to replicate?

sronilsson commented 4 years ago

Hi @talmo @ntabris - I came here checking for answers on this specific issue. My vid is about 9k frames, and predicting the entire video generates predictions for about 8.7k frames so matches up with my labelled frames. It's accompanied by these terminal msgs, I thought the indices may be frame #, happens regardless of tracker (cross-frame identity method) when using bottom-up:

AssertionError

     [[{{node EagerPyFunc_11}}]]

INFO:sleap.nn.inference:ERROR in sample index 145 INFO:sleap.nn.inference:{{function_node __inference_Dataset_map_group_instances_6447}} AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

talmo commented 4 years ago

Hi @sronilsson: This is a bug that has to do with running inference on skeletons that are not trees.

This means when settings up the edges between your nodes, you must ensure that no node (body part) has 2 parents. This means that any given body part can only be the "destination" node once in your Edges table.

The reason for this is that skeletons must form trees otherwise situations occur like in your case where in some frames two different parents coming from different instances both have the best matching score to the same destination node, creating an assignment conflict.

We're meaning to make the GUI clearer about this (#354) -- sorry about that!

In the meantime, just adjust your edges (won't require any new labeling) and retrain and you should be good to go.

@ntabris I'll update this issue with some testing data to try to reproduce directly.

sronilsson commented 4 years ago

Thanks a lot! Very helpful, I do have one of those.

sronilsson commented 4 years ago

I gave it a go: removed the dual-parent nodes, retrained model, and then started inference on the entire single video, which is the same video I labelled for training model. I keep seeing the AssertionError tough, which makes me think frames are missed again, any ideas? Thanks!

` gpu=0, labels=None, last_gpu=False, models=['C:/Users/NAPEadmin/Desktop/SLEAP_multi\models\200526_221318.multi_instance.8384\training_config.json'], only_labeled_frames=False, only_suggested_frames=False, output='C:/Users/NAPEadmin/Desktop/SLEAP_multi\predictions\Testing_Video_2.mp4.200527_091739.predictions.slp', test_pipeline=False, video_path='C:/Users/NAPEadmin/Desktop/Testing_Video_2.mp4', **{'single.peak_threshold': None, 'topdown.peak_threshold': None, 'tracking.clean_instance_count': 5, 'tracking.clean_iou_threshold': None, 'tracking.img_scale': None, 'tracking.match': None, 'tracking.min_match_points': None, 'tracking.min_new_track_points': None, 'tracking.of_max_levels': None, 'tracking.of_window_size': None, 'tracking.similarity': 'instance', 'tracking.track_window': None, 'tracking.tracker': 'flow', 'video.dataset': '', 'video.input_format': ''}) 2020-05-27 09:17:43.357639: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll 2020-05-27 09:17:43.434317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s 2020-05-27 09:17:43.439212: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll 2020-05-27 09:17:43.445304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-05-27 09:17:43.450557: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll 2020-05-27 09:17:43.453797: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll 2020-05-27 09:17:43.460074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll 2020-05-27 09:17:43.464225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll 2020-05-27 09:17:43.473680: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-05-27 09:17:43.476382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0 System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True 2020-05-27 09:17:43.527877: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2020-05-27 09:17:43.534967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s 2020-05-27 09:17:43.540893: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll 2020-05-27 09:17:43.542630: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-05-27 09:17:43.544247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll 2020-05-27 09:17:43.546796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll 2020-05-27 09:17:43.548427: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll 2020-05-27 09:17:43.550130: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll 2020-05-27 09:17:43.551824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-05-27 09:17:43.553737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0 2020-05-27 09:17:44.219729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-27 09:17:44.222217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 2020-05-27 09:17:44.224190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N 2020-05-27 09:17:44.225859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8685 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5) 2020-05-27 09:17:51.630946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-05-27 09:17:53.495878: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. This message will be only logged once. 2020-05-27 09:17:53.686904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-05-27 09:17:54.915792: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 130.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:54.921790: W tensorflow/core/kernels/gpu_utils.cc:48] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once. 2020-05-27 09:17:54.927562: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:54.932148: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:55.055107: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:55.060317: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:55.080477: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:55.086832: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:55.166040: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:55.172132: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:55.243575: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 130.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2020-05-27 09:17:57.802470: W tensorflow/core/common_runtime/bfc_allocator.cc:309] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2020-05-27 09:17:58.008906: W tensorflow/core/common_runtime/bfc_allocator.cc:309] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2020-05-27 09:18:08.343771: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:08.652527: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:08.964925: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:09.268983: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:09.578621: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:09.878855: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:10.191226: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:10.500425: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:10.804229: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:11.414356: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:11.721810: W tensorflow/core/framework/op_kernel.cc:1643] Unknown: AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

2020-05-27 09:18:21.644158: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at iterator_ops.cc:941 : Unknown: {{function_node __inference_Dataset_map_group_instances_5537}} AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError

     [[{{node EagerPyFunc_9}}]]

INFO:sleap.nn.inference:ERROR in sample index 57 INFO:sleap.nn.inference:{{function_node __inference_Dataset_map_group_instances_5537}} AssertionError: Traceback (most recent call last):

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 234, in call return func(device, token, args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\tensorflow_core\python\ops\script_ops.py", line 123, in call ret = self._func(*args)

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 520, in match_instances peaks, peak_scores, connections, instance_assignments

File "C:\Users\NAPEadmin\anaconda3\envs\sleap_env102\lib\site-packages\sleap\nn\paf_grouping.py", line 203, in make_predicted_instances assert instance_ind == instance_assignments[dst_peak_id]

AssertionError`

sronilsson commented 4 years ago

And I did not end up with predictions for all frames:

image

talmo commented 4 years ago

Hi @sronilsson,

If you're getting that error there's likely still something wrong with the skeleton. Do you mind sharing a screenshot of your skeleton window showing all the nodes/edges?

sronilsson commented 4 years ago

Yes, thank you - I will try again, maybe I missed something before training the model..

image

Here is a link to my skeleton json https://drive.google.com/file/d/1AqYCtw1NBRJVumMiGjQOU4l_wdl67vtk/view?usp=sharing

talmo commented 4 years ago

Hi @sronilsson,

Right, so one more detail I forgot: the skeleton should not form any cycles, and every node should be connected (if you want it to be part of the same instance).

The skeleton you have right now looks like this: image

As you can see, there's a cycle formed between the sides and the nose/tailbase, and the centroid is left out.

I'd recommend a skeleton that looks more like this for your case: image

Here's the skeleton JSON -- you can copy and paste into a new .json file and load it into the GUI:

{"directed": true, "graph": {"name": "Skeleton-2", "num_edges_inserted": 20}, "links": [{"edge_insert_idx": 18, "key": 0, "source": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Tail_base_1", 1.0]}}, "target": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Middle_tail_1", 1.0]}}, "type": {"py/reduce": [{"py/type": "sleap.skeleton.EdgeType"}, {"py/tuple": [1]}]}}, {"edge_insert_idx": 19, "key": 0, "source": {"py/id": 2}, "target": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["End_tail_1", 1.0]}}, "type": {"py/id": 3}}, {"edge_insert_idx": 12, "key": 0, "source": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Centroid", 1.0]}}, "target": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Nose_1", 1.0]}}, "type": {"py/id": 3}}, {"edge_insert_idx": 13, "key": 0, "source": {"py/id": 5}, "target": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Ear_left_1", 1.0]}}, "type": {"py/id": 3}}, {"edge_insert_idx": 14, "key": 0, "source": {"py/id": 5}, "target": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Ear_right_1", 1.0]}}, "type": {"py/id": 3}}, {"edge_insert_idx": 15, "key": 0, "source": {"py/id": 5}, "target": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Lateral_left_1", 1.0]}}, "type": {"py/id": 3}}, {"edge_insert_idx": 16, "key": 0, "source": {"py/id": 5}, "target": {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["Lateral_right_1", 1.0]}}, "type": {"py/id": 3}}, {"edge_insert_idx": 17, "key": 0, "source": {"py/id": 5}, "target": {"py/id": 1}, "type": {"py/id": 3}}], "multigraph": true, "nodes": [{"id": {"py/id": 6}}, {"id": {"py/id": 7}}, {"id": {"py/id": 9}}, {"id": {"py/id": 1}}, {"id": {"py/id": 2}}, {"id": {"py/id": 4}}, {"id": {"py/id": 10}}, {"id": {"py/id": 8}}, {"id": {"py/id": 5}}]}

The reasoning here is that we want to have fewer steps between the most reliable nodes (e.g., centroids) and every other node as child nodes depend on parent nodes in order to be grouped.

Apologies for not having these details more clearly laid out -- we'll have it just be a part of the GUI (#354) so you won't run into this issue again.

Let me know if that still kicks up the error you were seeing, and if you run into other problems feel free to open a new issue.

sronilsson commented 4 years ago

That's very helpful @talmo. One final, and different question. My videos are very similar to this where you have good tracking of two white mice - https://twitter.com/MurthyLab/status/1259948428949426177. Any chance you would be happy to share the weights you have for tracking of the white mice? I'm thinking that could speed things up for me. Thanks again.

talmo commented 4 years ago

Hi @sronilsson,

We're not able to share the weights just yet for that dataset but the configuration we used to get the results on that video are pretty close to the baseline profile for bottom-up: image The yellow ones are the important ones (note the lower than default initial learning rate), and the green ones are optional ones that you can try to enable to improve the generalization to new data if you need.

sronilsson commented 4 years ago

Thanks @talmo - no more AttributeErrors all seems to be working - a note though that the progress bar never updates on my windows machine. I don't know if its meant to count percent frames or videos, if it is videos I guess it makes sense as I am only analysing one.
image

ntabris commented 4 years ago

@sronilsson: About the progress bar, yep, it doesn't currently update during inference for a single video (only updates after inference on each video).

ntabris commented 4 years ago

Re. the original issue of user labeled frames disappearing after training/inference on whole video, I'm still unable to reproduce. This was happening for a specific user. We met over Zoom and the data loss did not occur. I got the training data from this user (including two of the videos) and tried training and the data loss did not occur. Unless we hear reports of this happening again or to someone else I'm going to say this was a fluke and/or has already been fixed.