talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
436 stars 97 forks source link

Can't run the Inference in (topdown multianimal) flow/centroid/hungarian #1994

Open miteshkalathiya opened 1 month ago

miteshkalathiya commented 1 month ago

Bug description

Started inference at: 2024-10-10 11:50:57.966297 Args: { 'data_path': 'C:/Users/Mitesh Kalathiya/MITESH LABELS.slp', 'models': [ 'C:/Users/Mitesh Kalathiya\models\240830_165951.centroid.n=500\initial_config.json', 'C:/Users/Mitesh Kalathiya\models\240830_170733.centered_instance.n=500\initial_config.json' ], 'frames': '0,-91756', 'only_labeled_frames': False, 'only_suggested_frames': False, 'output': 'C:/Users/Mitesh Kalathiya\predictions\MITESH LABELS.slp.241010_115029.predictions.slp', 'no_empty_frames': True, 'verbosity': 'json', 'video.dataset': None, 'video.input_format': 'channels_last', 'video.index': '0', 'cpu': False, 'first_gpu': False, 'last_gpu': False, 'gpu': 'auto', 'max_edge_length_ratio': 0.25, 'dist_penalty_weight': 1.0, 'batch_size': 4, 'open_in_gui': False, 'peak_threshold': 0.2, 'max_instances': None, 'tracking.tracker': 'flow', 'tracking.max_tracking': None, 'tracking.max_tracks': None, 'tracking.target_instance_count': None, 'tracking.pre_cull_to_target': None, 'tracking.pre_cull_iou_threshold': None, 'tracking.post_connect_single_breaks': 0, 'tracking.clean_instance_count': None, 'tracking.clean_iou_threshold': None, 'tracking.similarity': 'centroid', 'tracking.match': 'hungarian', 'tracking.robust': None, 'tracking.track_window': 5, 'tracking.min_new_track_points': None, 'tracking.min_match_points': None, 'tracking.img_scale': None, 'tracking.of_window_size': None, 'tracking.of_max_levels': None, 'tracking.save_shifted_instances': None, 'tracking.kf_node_indices': None, 'tracking.kf_init_frame_count': None }

INFO:sleap.nn.inference:Auto-selected GPU 0 with 4005 MiB of free memory. 2024-10-10 11:51:06.829957: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-10 11:51:09.177939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2766 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1 2024-10-10 11:51:18.827461: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201 2024-10-10 11:51:25.830388: W tensorflow/core/common_runtime/bfc_allocator.cc:343] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. Versions: SLEAP: 1.3.3 TensorFlow: 2.7.0 Numpy: 1.21.6 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0

System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True

2024-10-10 11:54:05.156574: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2024-10-10 11:54:05.156855: W .\tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 4294967296 2024-10-10 11:54:05.157414: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 3865470464 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2024-10-10 11:54:05.157567: W .\tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 3865470464 2024-10-10 11:54:05.157732: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 3478923264 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2024-10-10 11:54:05.157838: W .\tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 3478923264

Expected behaviour

I expected it to run the inference and give me predictions on the entire video.

Actual behaviour

It gives me an error as mentioned above.

eberrigan commented 1 month ago

Hi @miteshkalathiya,

According to your logs, you do not have enough GPU memory to run inference with those parameters. Please try decreasing the batch size if you do not have access to a computer with more GPU memory.

Thank you!

Elizabeth