talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
433 stars 96 forks source link

Remote inference on suggested/labeled frames fails due to searching for original videos #1552

Closed olivier-cuttlefish closed 1 year ago

olivier-cuttlefish commented 1 year ago

Hello, First of all, thank you so much for your amazing tool and the support you are providing. I am using v1.3.3. I ran my training on the clusters. Now, I would like to run inference on the labeled frames contained in the training package. The command I am running is sleap-track -m "models/231013_164017.multi_instance" --only-labeled-frames -o "labels002_predictions.slp" "labels.v002.merged.pkg.slp"

I also tried using --only-suggested-frames.

In both cases, it seems that sleap is trying to search for the original videos and pull the frames from there, while it should find them in the training package (as it was able to do so for the training).

Here is the traceback (/home/o/o-xxx/ is on the cluster, while /home/xxx/Documents/ are local paths):

$ sleap-track -m "models/231013_164017.multi_instance" --only-suggested-frames -o "labels002_predictions.slp" "labels.v002.merged.pkg.slp" Started inference at: 2023-10-16 11:12:09.508513 Args: { /// }

INFO:sleap.nn.inference:Auto-selected GPU 0 with 81042 MiB of free memory. Versions: SLEAP: 1.3.3 TensorFlow: 2.7.0 Numpy: 1.19.5 Python: 3.7.12 OS: Linux-4.18.0-477.15.1.el8_8.x86_64-x86_64-with-centos-8.8-Green_Obsidian

System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True

2023-10-16 11:12:23.667987: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-10-16 11:12:24.115932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 78929 MB memory: -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:10:00.0, compute capability: 8.0 Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ? Traceback (most recent call last): File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 241, in _try_frame_from_source_video return self.source_video.get_frame(idx) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 206, in source_video if self.source_video_available: File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 1082, in len return self.frames File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 1046, in getattr return getattr(self.backend, item) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 443, in frames return int(self.frames_float) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 404, in __frames_float return self.reader.get(cv2.CAP_PROP_FRAME_COUNT) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 385, in __reader f"Could not find filename video filename named {self.filename}" FileNotFoundError: Could not find filename video filename named /home/xxx/Documents/DATASETS/video1/cam0_2023-01-30-17-24-27 (1).mp4

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/o/o-xxx/miniconda3/envs/sleap133/bin/sleap-track", line 33, in sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-track')()) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/nn/inference.py", line 5424, in main labels_pr = predictor.predict(provider) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/nn/inference.py", line 526, in predict self._make_labeled_frames_from_generator(generator, data) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/nn/inference.py", line 3266, in _make_labeled_frames_from_generator for ex in generator: File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/nn/inference.py", line 435, in _predict_generator for ex in self.pipeline.make_dataset(): File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/nn/data/pipelines.py", line 282, in make_dataset ds = self.providers[0].make_dataset() File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/nn/data/providers.py", line 187, in make_dataset first_image = tf.convert_to_tensor(self.labels[0].image) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/instance.py", line 1765, in image return self.video.get_frame(self.frame_idx) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 1104, in get_frame return self.backend.get_frame(idx) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 319, in get_frame return self._try_frame_from_source_video(idx) File "/home/o/o-xxx/miniconda3/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 243, in _try_frame_from_source_video raise IndexError(f"Frame index {idx} not in original index.") IndexError: Frame index 0 not in original index.

roomrys commented 1 year ago

Still looking into this, but as a quick update, I ran a test in the GUI by creating a .pkg.slp and then running training/inference on it (mainly to double check the command line call) which seems correct:

Command line call:
sleap-track /Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/courtship_labels.pkg.slp --only-suggested-frames -m /Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/models/231016_063130.centroid.n=149 -m /Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/models/231016_070843.centered_instance.n=149 -o /Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/predictions/courtship_labels.pkg.slp.231016_072541.predictions.slp --verbosity json --no-empty-frames

Started inference at: 2023-10-16 07:25:46.154717
Args:
{
│   'data_path': '/Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/courtship_labels.pkg.slp',
│   'models': [
│   │   '/Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/models/231016_063130.centroid.n=149',
│   │   '/Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/models/231016_070843.centered_instance.n=149'
│   ],
│   'frames': '',
│   'only_labeled_frames': False,
│   'only_suggested_frames': True,
2023-10-16 07:25:46.723577: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-10-16 07:25:46.723760: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
│   'output': '/Users/liezlmaree/Projects/sleap-datasets/drosophila-melanogaster-courtship/predictions/courtship_labels.pkg.slp.231016_072541.predictions.slp',
│   'no_empty_frames': True,
│   'verbosity': 'json',
│   'video.dataset': None,
│   'video.input_format': 'channels_last',
│   'video.index': '',
│   'cpu': False,
│   'first_gpu': False,
│   'last_gpu': False,
│   'gpu': 'auto',
│   'max_edge_length_ratio': 0.25,
│   'dist_penalty_weight': 1.0,
│   'batch_size': 4,
│   'open_in_gui': False,
│   'peak_threshold': 0.2,
│   'max_instances': None,
│   'tracking.tracker': None,
│   'tracking.max_tracking': None,
│   'tracking.max_tracks': None,
│   'tracking.target_instance_count': None,
2023-10-16 07:25:47.767238: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
│   'tracking.pre_cull_to_target': None,
│   'tracking.pre_cull_iou_threshold': None,
│   'tracking.post_connect_single_breaks': None,
│   'tracking.clean_instance_count': None,
│   'tracking.clean_iou_threshold': None,
│   'tracking.similarity': None,
│   'tracking.match': None,
│   'tracking.robust': None,
│   'tracking.track_window': None,
│   'tracking.min_new_track_points': None,
│   'tracking.min_match_points': None,
│   'tracking.img_scale': None,
│   'tracking.of_window_size': None,
│   'tracking.of_max_levels': None,
│   'tracking.save_shifted_instances': None,
│   'tracking.kf_node_indices': None,
│   'tracking.kf_init_frame_count': None
}

INFO:sleap.nn.inference:Failed to query GPU memory from nvidia-smi. Defaulting to first GPU.
Metal device set to: Apple M2 Pro
2023-10-16 07:25:49.822852: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-10-16 07:25:49.907230: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -45 } dim { size: -46 } dim { size: -47 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -15 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 10 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: -48 } dim { size: -49 } dim { size: 1 } } }
2023-10-16 07:25:49.907533: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1024 } dim { size: 1024 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -15 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 10 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: -56 } dim { size: -57 } dim { size: 3 } } }
2023-10-16 07:25:49.911132: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -91 } dim { size: -92 } dim { size: -93 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -20 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -20 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" model: "0" num_cores: 10 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -20 } dim { size: -95 } dim { size: -96 } dim { size: 1 } } }
Versions:
SLEAP: 1.3.3
TensorFlow: 2.9.2
Numpy: 1.22.3
Python: 3.9.15
OS: macOS-13.5-arm64-arm-64bit

System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True

Finished inference at: 2023-10-16 07:25:50.748722
Total runtime: 4.5940141677856445 secs
Predicted frames: 20/20
Process return code: 0
skipped 98 redundant instances
olivier-cuttlefish commented 1 year ago

Thank you very much for your quick response. I tried to run it locally and indeed managed to make it work:

 sleap-track -m "models/231013_164017.multi_instance" --only-labeled-frames -o "labels002_predictions.slp" "labels.v002.merged.pkg.slp"
Started inference at: 2023-10-17 10:09:25.229688
Args:
{
│   'data_path': 'labels.v002.merged.pkg.slp',
│   'models': ['models/231013_164017.multi_instance'],
│   'frames': '',
│   'only_labeled_frames': True,
│   'only_suggested_frames': False,
│   'output': 'labels002_predictions.slp',
│   'no_empty_frames': False,
│   'verbosity': 'rich',
│   'video.dataset': None,
│   'video.input_format': 'channels_last',
│   'video.index': '',
│   'cpu': False,
│   'first_gpu': False,
│   'last_gpu': False,
│   'gpu': 'auto',
│   'max_edge_length_ratio': 0.25,
│   'dist_penalty_weight': 1.0,
│   'batch_size': 4,
│   'open_in_gui': False,
│   'peak_threshold': 0.2,
│   'max_instances': None,
│   'tracking.tracker': None,
│   'tracking.max_tracking': None,
│   'tracking.max_tracks': None,
│   'tracking.target_instance_count': None,
│   'tracking.pre_cull_to_target': None,
│   'tracking.pre_cull_iou_threshold': None,
│   'tracking.post_connect_single_breaks': None,
│   'tracking.clean_instance_count': None,
│   'tracking.clean_iou_threshold': None,
│   'tracking.similarity': None,
│   'tracking.match': None,
│   'tracking.robust': None,
│   'tracking.track_window': None,
│   'tracking.min_new_track_points': None,
│   'tracking.min_match_points': None,
│   'tracking.img_scale': None,
│   'tracking.of_window_size': None,
│   'tracking.of_max_levels': None,
│   'tracking.save_shifted_instances': None,
│   'tracking.kf_node_indices': None,
│   'tracking.kf_init_frame_count': None
}

2023-10-17 10:09:25.261963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:25.265508: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:25.265614: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
INFO:sleap.nn.inference:Auto-selected GPU 0 with 10980 MiB of free memory.
Versions:
SLEAP: 1.3.3
TensorFlow: 2.7.0
Numpy: 1.19.5
Python: 3.7.12
OS: Linux-5.15.0-84-generic-x86_64-with-debian-bullseye-sid

System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True

2023-10-17 10:09:25.937969: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-17 10:09:25.939208: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:25.939381: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:25.939471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:26.215092: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:26.215278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:26.215386: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-17 10:09:26.215463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9172 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-10-17 10:09:38.259870: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201
2023-10-17 10:09:39.178450: W tensorflow/core/common_runtime/bfc_allocator.cc:343] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
2023-10-17 10:09:41.077182: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.95GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-17 10:09:41.077213: W tensorflow/core/kernels/gpu_utils.cc:49] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 ?
Finished inference at: 2023-10-17 10:10:57.183653
Total runtime: 91.95398092269897 secs
Predicted frames: 125/125
Provenance:
{
│   'model_paths': ['models/231013_164017.multi_instance/training_config.json'],
│   'predictor': 'BottomUpPredictor',
│   'sleap_version': '1.3.3',
│   'platform': 'Linux-5.15.0-84-generic-x86_64-with-debian-bullseye-sid',
│   'command': '/home/xxx/mambaforge/envs/sleap133/bin/sleap-track -m models/231013_164017.multi_instance --only-labeled-frames -o labels002_predictions.slp labels.v002.merged.pkg.slp',
│   'data_path': 'labels.v002.merged.pkg.slp',
│   'output_path': 'labels002_predictions.slp',
│   'total_elapsed': 91.95398092269897,
│   'start_timestamp': '2023-10-17 10:09:25.229688',
│   'finish_timestamp': '2023-10-17 10:10:57.183653'
}

Saved output: labels002_predictions.slp

However, the prediction file seems to be faulty and fails to open in the GUI, either when opening it directly or by merging it into the project, it throws an h5py based error.

Traceback (most recent call last):
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 289, in openProject
    self.execute(OpenProject, filename=filename, first_open=first_open)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 242, in execute
    command().execute(context=self, params=kwargs)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 138, in execute
    self.do_with_signal(context, params)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 162, in do_with_signal
    cls.do_action(context, params)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 727, in do_action
    context.loadProjectFile(filename)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 274, in loadProjectFile
    self.execute(LoadProjectFile, filename=filename)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 242, in execute
    command().execute(context=self, params=kwargs)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 138, in execute
    self.do_with_signal(context, params)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 162, in do_with_signal
    cls.do_action(context, params)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/commands.py", line 675, in do_action
    context.app.on_data_update([UpdateTopic.project, UpdateTopic.all])
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/app.py", line 1166, in on_data_update
    self.videos_dock.table.model().items = self.labels.videos
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/dataviews.py", line 103, in items
    item_data = self.item_to_data(obj, item)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/dataviews.py", line 392, in item_to_data
    return {key: getattr(item, key) for key in self.properties}
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/gui/dataviews.py", line 392, in <dictcomp>
    return {key: getattr(item, key) for key in self.properties}
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 1046, in __getattr__
    return getattr(self.backend, item)
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 251, in frames
    return self.__dataset_h5.shape[0]
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 154, in __dataset_h5
    self._load()
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/sleap/io/video.py", line 131, in _load
    self.__dataset_h5 = self.__file_h5[self.dataset]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/xxx/mambaforge/envs/sleap133/lib/python3.7/site-packages/h5py/_hl/group.py", line 288, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: 'Unable to open object (component not found)'

I have a very strong suspicion that it might have come from messy handling of my datasets, as there was several steps of using exported packages, then merging into it other videos, exporting, etc.... It might has brought some issues with path management.

Ideally, I would love to start clean with my videos and a new project, but still keep my hundred of annotations done before when performing training, and get metrics from them after inference.

roomrys commented 1 year ago

Hi @olivier-cuttlefish,

Are you able to used the trained model to predict on just a normal slp file instead of the .pkg.slp? This will only work locally (since you probably don't have the videos uploaded to the drive), but it'll at least allow you to use the model to make new labels pretty quickly.

Right now you have a .pkg.slp listed as the data_path argument, but I'd like to try it with a normal .slp as the data_path.

roomrys commented 1 year ago

It definitely seems like something has happened to the video paths upon merging. Packages export only the images needed for training (and inference - on suggestions), thus they don't reference the original videos anymore and instead reference a table in the h5 file (the .pkg.slp).

The .pkg.slp files are intended to only be used for exporting for remote training. They shouldn't really have new labels added to them. This discussion deals with a similar situation and might be helpful as well.

olivier-cuttlefish commented 1 year ago

Hi @roomrys, sorry for the late reply as I was very busy the precedent days. I have been able indeed to predict on videos using the trained model. I have done other rounds of labeling and training and now my project file seems to be more stable in terms of pathfinding. Not sure how I did that though... Also, thank you for the discussion thread you shared, the script there seems to be useful if I ever get into path issues with my projects. Thank you very much for your help ! :)