ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Publish_observations.ipynb pp.process_detections errors #420

Open KalindiFonda opened 2 months ago

KalindiFonda commented 2 months ago

🐛 Bug

Hello, I am running the Publish_observations.ipynb via the Docker Image. I am running into some issues with this part:

dets_df = pp.process_detections(
    project=pp.project,
    db_connection=pp.db_connection,
    csv_paths=pp.csv_paths,
    annotations_csv_path=mlp.eval_dir,
    model_registry=mlp.registry,
    model=model.value,
    team_name=mlp.team_name,
    project_name=mlp.project_name,
)

When I run it, if there are annotations in the annotations.csv file it returns "Registry invalid" (https://github.com/ocean-data-factory-sweden/kso/blob/4c1b6daddbfbdf7f8fbd70c80702ac75dbc10520/kso_utils/yolo_utils.py#L2071), and then errors out on the line 2144 because species_mapping was undefined.

And if there were no annotations in the annotations.csv file, it's easy to miss the error among all the other output (but it errors out on the next line, but the error is not entirely clear - i think something along the lines of "columns not found").

Also, is the selecting of folders meant to pick the current folder if there are no actions by the user?

I didn't double check it, but if I did the "restart kernel and run all", then I have to go and set the folders manually (same with the movie selection, but that might be intended?)

Is my assumption that I can run the Publish observations notebook independently of others correct?

Thanks

To Reproduce with annotations:

select movie_4.mp4 with a low confidence threshold (0.2)

Output:

ERROR:root:Registry invalid.
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Cell In[22], line 1
----> 1 dets_df = pp.process_detections(
      2     project=pp.project,
      3     db_connection=pp.db_connection,
      4     csv_paths=pp.csv_paths,
      5     annotations_csv_path=mlp.eval_dir,
      6     model_registry=mlp.registry,
      7     model=model.value,
      8     team_name=mlp.team_name,
      9     project_name=mlp.project_name,
     10 )

File ~/Desktop/wildlifeai/kso/kso_utils/project.py:1182, in ProjectProcessor.process_detections(self, project, db_connection, csv_paths, annotations_csv_path, model_registry, model, project_name, team_name)
   1179 out_list = []
   1180 for movie_path in self.selected_movies_paths:
   1181     out_list.append(
-> 1182         yolo_utils.process_detections(
   1183             project=project,
   1184             db_connection=db_connection,
   1185             csv_paths=csv_paths,
   1186             annotations_csv_path=annotations_csv_path,
   1187             model_registry=model_registry,
   1188             selected_movies_id=self.selected_movies_ids,
   1189             model=model,
   1190             project_name=project_name,
   1191             team_name=team_name,
   1192             source_movies=movie_path,
   1193         )
   1194     )
   1195 df_concat = pd.concat(out_list, axis=1)
   1196 return df_concat

File ~/Desktop/wildlifeai/kso/kso_utils/yolo_utils.py:2220, in process_detections(project, db_connection, csv_paths, annotations_csv_path, model_registry, selected_movies_id, model, project_name, team_name, source_movies)
   2217         project_name = "spyfish_aotearoa"
   2219 # Obtain a dictionary with the mapping between the class ids and the species names
-> 2220 species_mapping = get_species_mapping(
   2221     model, project_name, team_name, model_registry
   2222 )
   2224 # Add a column with the species name corresponding to each class id
   2225 df["commonName"] = df["class_id"].astype(str).map(species_mapping)

File ~/Desktop/wildlifeai/kso/kso_utils/yolo_utils.py:2144, in get_species_mapping(model, project_name, team_name, registry)
   2141 else:
   2142     logging.error("Registry invalid.")
-> 2144 return species_mapping

UnboundLocalError: local variable 'species_mapping' referenced before assignment
victor-wildlife commented 1 month ago

@jannesgg thoughts on this?

jannesgg commented 1 month ago

@victor-wildlife @KalindiFonda Indeed, I now encountered the same errors and I believe it is because this part of the code is linked to having an active registry and therefore a set of metadata from which to produce the species_mapping. This means that is does not currently work for local models, since the pt files themselves do not contain the class names. I will try to set up a workaround so that these species names are not required in this special case and then the rest should work as expected.

jannesgg commented 1 month ago

@KalindiFonda This should now be fixed in the latest dev version. Feel free to test this and let me know if it is now working for you.

KalindiFonda commented 1 day ago

Hello @jannesgg and thanks for checking.

Can you check

model = mlp.choose_model(publish=True)

The drop-down is not giving me any options.

Seems like the model_dict retrieved via zenodo_utils.download_and_extract_models_from_zenodo in choose_model doesn't return anything.

Also there's an API key that is hard coded, not sure that's intended. https://github.com/ocean-data-factory-sweden/kso/blob/70bfea13e606c45ae6073ad387ed413c87423665/kso_utils/project.py#L1856

(I can also create a new issue)