Closed donkyjohn closed 1 month ago
Manually removing the duplicates from the csv file doesn't help, it changes the error to ERROR:root:NOT NULL constraint failed: photos.filename when i try to initiate the processor. Files are stored in zooniverse with .JPG, could it be that this is causing any issues?
@donkyjohn Please check whether the latest changes to dev address these issues.
Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:
git fetch && git status -uno
to check andgit pull
to update repoIf this is a custom dataset/training question you must include your
train*.jpg
,test*.jpg
andresults.png
figures, or we can not help you. You can generate these withutils.plot_results()
.🐛 Bug
If Koster_Seafloor_Obs is selected as project, initialising the project processor gives the following Error: ERROR:root:UNIQUE constraint failed: photos.filename.
Next:
The zooniverse classifications are retrieved but i also receive the following Error: INFO:root:127 Zooniverse classifications have been retrieved from 125 subjects
KeyError Traceback (most recent call last) File ~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance) 3620 try: -> 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err:
File ~/.local/lib/python3.10/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~/.local/lib/python3.10/site-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'subject_type'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last) Cell In[8], line 2 1 # Get the classifications that were added manually ----> 2 pp.process_zoo_classifications()
File ~/kso/kso_utils/project.py:787, in ProjectProcessor.process_zoo_classifications(self, test) 783 workflow_checks = self.workflow_widget.checks 785 # Retrieve a subset of the subjects from the workflows of interest and 786 # populate the sql subjects table and flatten the classifications provided the cit. scientists --> 787 self.processed_zoo_classifications = zoo_utils.process_zoo_classifications( 788 project=self.project, 789 server_connection=self.server_connection, 790 db_connection=self.db_connection, 791 workflow_widget_checks=workflow_checks, 792 workflows_df=self.zoo_info["workflows"], 793 subjects_df=self.zoo_info["subjects"], 794 csv_paths=self.csv_paths, 795 classifications_data=self.zoo_info["classifications"], 796 subject_type=workflow_checks["Subject type: #0"], 797 )
File ~/kso/kso_utils/zooniverse_utils.py:431, in process_zoo_classifications(project, server_connection, db_connection, workflow_widget_checks, workflows_df, subjects_df, csv_paths, classifications_data, subject_type) 427 drop_table(conn=db_connection, table_name="subjects") 429 if len(subjects_series) > 0: 430 # Fill or re-fill subjects table --> 431 populate_subjects(project, server_connection, db_connection, subjects_series) 432 else: 433 logging.error("No subjects to populate database from the workflows selected.")
File ~/kso/kso_utils/zooniverse_utils.py:1143, in populate_subjects(project, server_connection, db_connection, subjects) 1140 # Rename columns to match the db format 1141 subjects = subjects.rename(columns=rename_cols) -> 1143 if hasattr(subjects["subject_type"], "columns"): 1144 # Avoid having two subject_type columns (one from Zoo one from the db) 1145 subjects["subject_type0"] = subjects["subject_type"].iloc[:, 0] 1146 subjects["subject_type1"] = subjects["subject_type"].iloc[:, 1]
File ~/.local/lib/python3.10/site-packages/pandas/core/frame.py:3506, in DataFrame.getitem(self, key) 3504 if self.columns.nlevels > 1: 3505 return self._getitem_multilevel(key) -> 3506 indexer = self.columns.get_loc(key) 3507 if is_integer(indexer): 3508 indexer = [indexer]
File ~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance) 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err: -> 3623 raise KeyError(key) from err 3624 except TypeError: 3625 # If we have a listlike key, _check_indexing_error will raise 3626 # InvalidIndexError. Otherwise we fall through and re-raise 3627 # the TypeError. 3628 self._check_indexing_error(key)
KeyError: 'subject_type'
To Reproduce (REQUIRED)
notebook was run via cloudina. Input:
Output:
Expected behavior
A clear and concise description of what you expected to happen.
Environment
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.