Notebook 8 issue with processing frames for ML learning

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

Current repo: run git fetch && git status -uno to check and git pull to update repo
Common dataset: coco.yaml or coco128.yaml
Common environment: Colab, Google Cloud, or Docker image. See https://github.com/ultralytics/yolov5#environments

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

🐛 Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input: Test proportion: 0.2

# Run the preparation script
mlp.prepare_dataset(
    agg_df=pp.aggregated_zoo_classifications,
    out_path=output_folder.selected,
    img_size=(720, 540),
    perc_test=percentage_test.value,
)

Species chosen: Protanthea simplex (497 annotation) Output:

IndexError                                Traceback (most recent call last)
File /usr/src/app/kso-dev/kso_utils/project.py:1137, in MLProjectProcessor.prepare_dataset.<locals>.on_button_clicked(b)
   1135 self.species_of_interest = species_list.value
   1136 # code for prepare dataset for machine learning
-> 1137 self.modules["yolo_utils"].frame_aggregation(
   1138     project=self.project,
   1139     server_connection=self.server_connection,
   1140     db_connection=self.db_connection,
   1141     out_path=out_path,
   1142     perc_test=perc_test,
   1143     class_list=self.species_of_interest,
   1144     img_size=img_size,
   1145     remove_nulls=remove_nulls,
   1146     track_frames=track_frames,
   1147     n_tracked_frames=n_tracked_frames,
   1148     agg_df=agg_df,
   1149 )

File /usr/src/app/kso-dev/kso_utils/yolo_utils.py:458, in frame_aggregation(project, server_connection, db_connection, out_path, perc_test, class_list, img_size, out_format, remove_nulls, track_frames, n_tracked_frames, agg_df)
    456 # Add species_id to train_rows
    457 if "species_id" not in train_rows.columns:
--> 458     train_rows["species_id"] = train_rows["label"].apply(
    459         lambda x: species_df[species_df.commonName == x].id.values[0]
    460         if x != "empty"
    461         else "empty",
    462         1,
    463     )
    464     train_rows.drop(columns=["label"], axis=1, inplace=True)
    466 sp_id2mod_id = {
    467     species_df[species_df.clean_label == species_list[i]].id.values[0]: i
    468     for i in range(len(species_list))
    469 }

File /usr/local/lib/python3.8/dist-packages/pandas/core/series.py:4430, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4320 def apply(
   4321     self,
   4322     func: AggFuncType,
   (...)
   4325     **kwargs,
   4326 ) -> DataFrame | Series:
   4327     """
   4328     Invoke function on values of Series.
   4329 
   (...)
   4428     dtype: float64
   4429     """
-> 4430     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File /usr/local/lib/python3.8/dist-packages/pandas/core/apply.py:1082, in SeriesApply.apply(self)
   1078 if isinstance(self.f, str):
   1079     # if we are a string, try to dispatch
   1080     return self.apply_str()
-> 1082 return self.apply_standard()

File /usr/local/lib/python3.8/dist-packages/pandas/core/apply.py:1137, in SeriesApply.apply_standard(self)
   1131         values = obj.astype(object)._values
   1132         # error: Argument 2 to "map_infer" has incompatible type
   1133         # "Union[Callable[..., Any], str, List[Union[Callable[..., Any], str]],
   1134         # Dict[Hashable, Union[Union[Callable[..., Any], str],
   1135         # List[Union[Callable[..., Any], str]]]]]"; expected
   1136         # "Callable[[Any], Any]"
-> 1137         mapped = lib.map_infer(
   1138             values,
   1139             f,  # type: ignore[arg-type]
   1140             convert=self.convert_dtype,
   1141         )
   1143 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1144     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1145     #  See also GH#25959 regarding EA support
   1146     return obj._constructor_expanddim(list(mapped), index=obj.index)

File /usr/local/lib/python3.8/dist-packages/pandas/_libs/lib.pyx:2870, in pandas._libs.lib.map_infer()

File /usr/src/app/kso-dev/kso_utils/yolo_utils.py:459, in frame_aggregation.<locals>.<lambda>(x)
    456 # Add species_id to train_rows
    457 if "species_id" not in train_rows.columns:
    458     train_rows["species_id"] = train_rows["label"].apply(
--> 459         lambda x: species_df[species_df.commonName == x].id.values[0]
    460         if x != "empty"
    461         else "empty",
    462         1,
    463     )
    464     train_rows.drop(columns=["label"], axis=1, inplace=True)
    466 sp_id2mod_id = {
    467     species_df[species_df.clean_label == species_list[i]].id.values[0]: i
    468     for i in range(len(species_list))
    469 }

IndexError: index 0 is out of bounds for axis 0 with size 0

Expected behavior

Additional context

Might it be the

File /usr/src/app/kso-dev/kso_utils/yolo_utils.py:459, in frame_aggregation.<locals>.<lambda>(x)
    456 # Add species_id to train_rows
    457 if "species_id" not in train_rows.columns:
    458     train_rows["species_id"] = train_rows["label"].apply(
--> 459         lambda x: species_df[species_df.commonName == x].id.values[0]
    460         if x != "empty"
    461         else "empty",
    462         1,
    463     )

that is the issue here?

ocean-data-factory-sweden / kso