Issues regarding initialising the project processor and receiving the annotations from Zooniverse

donkyjohn commented 2 months ago

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

Current repo: run git fetch && git status -uno to check and git pull to update repo
Common dataset: coco.yaml or coco128.yaml
Common environment: Colab, Google Cloud, or Docker image. See https://github.com/ultralytics/yolov5#environments

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

🐛 Bug

If Koster_Seafloor_Obs is selected as project, initialising the project processor gives the following Error: ERROR:root:UNIQUE constraint failed: photos.filename.

latest_zoo_info = No
connect zoo project
Choose_zoo_workflows() => 1, KSO_Saga_Frames, frame, min workflow 1.0

The zooniverse classifications are retrieved but i also receive the following Error: INFO:root:127 Zooniverse classifications have been retrieved from 125 subjects

KeyError Traceback (most recent call last) File ~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance) 3620 try: -> 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err:

File ~/.local/lib/python3.10/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~/.local/lib/python3.10/site-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'subject_type'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) Cell In[8], line 2 1 # Get the classifications that were added manually ----> 2 pp.process_zoo_classifications()

File ~/kso/kso_utils/project.py:787, in ProjectProcessor.process_zoo_classifications(self, test) 783 workflow_checks = self.workflow_widget.checks 785 # Retrieve a subset of the subjects from the workflows of interest and 786 # populate the sql subjects table and flatten the classifications provided the cit. scientists --> 787 self.processed_zoo_classifications = zoo_utils.process_zoo_classifications( 788 project=self.project, 789 server_connection=self.server_connection, 790 db_connection=self.db_connection, 791 workflow_widget_checks=workflow_checks, 792 workflows_df=self.zoo_info["workflows"], 793 subjects_df=self.zoo_info["subjects"], 794 csv_paths=self.csv_paths, 795 classifications_data=self.zoo_info["classifications"], 796 subject_type=workflow_checks["Subject type: #0"], 797 )

File ~/kso/kso_utils/zooniverse_utils.py:431, in process_zoo_classifications(project, server_connection, db_connection, workflow_widget_checks, workflows_df, subjects_df, csv_paths, classifications_data, subject_type) 427 drop_table(conn=db_connection, table_name="subjects") 429 if len(subjects_series) > 0: 430 # Fill or re-fill subjects table --> 431 populate_subjects(project, server_connection, db_connection, subjects_series) 432 else: 433 logging.error("No subjects to populate database from the workflows selected.")

File ~/kso/kso_utils/zooniverse_utils.py:1143, in populate_subjects(project, server_connection, db_connection, subjects) 1140 # Rename columns to match the db format 1141 subjects = subjects.rename(columns=rename_cols) -> 1143 if hasattr(subjects["subject_type"], "columns"): 1144 # Avoid having two subject_type columns (one from Zoo one from the db) 1145 subjects["subject_type0"] = subjects["subject_type"].iloc[:, 0] 1146 subjects["subject_type1"] = subjects["subject_type"].iloc[:, 1]

File ~/.local/lib/python3.10/site-packages/pandas/core/frame.py:3506, in DataFrame.getitem(self, key) 3504 if self.columns.nlevels > 1: 3505 return self._getitem_multilevel(key) -> 3506 indexer = self.columns.get_loc(key) 3507 if is_integer(indexer): 3508 indexer = [indexer]

File ~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance) 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err: -> 3623 raise KeyError(key) from err 3624 except TypeError: 3625 # If we have a listlike key, _check_indexing_error will raise 3626 # InvalidIndexError. Otherwise we fall through and re-raise 3627 # the TypeError. 3628 self._check_indexing_error(key)

KeyError: 'subject_type'

To Reproduce (REQUIRED)

notebook was run via cloudina. Input:

import torch

a = torch.tensor([5])
c = a / 0

Output:

Traceback (most recent call last):
  File "/Users/glennjocher/opt/anaconda3/envs/env1/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-be04c762b799>", line 5, in <module>
    c = a / 0
RuntimeError: ZeroDivisionError

Expected behavior

A clear and concise description of what you expected to happen.

Environment

If applicable, add screenshots to help explain your problem.

OS: [e.g. Ubuntu]
GPU [e.g. 2080 Ti]

Additional context

Add any other context about the problem here.

donkyjohn commented 2 months ago

Manually removing the duplicates from the csv file doesn't help, it changes the error to ERROR:root:NOT NULL constraint failed: photos.filename when i try to initiate the processor. Files are stored in zooniverse with .JPG, could it be that this is causing any issues?

jannesgg commented 2 months ago

@donkyjohn Please check whether the latest changes to dev address these issues.

ocean-data-factory-sweden / kso