ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Issue in Notebook 8, older Koster workflows indexingerror #399

Closed Bergylta closed 1 month ago

Bergylta commented 1 month ago

🐛 Bug

Issue with trying to get the information from the older workflows when connecting to Zooniverse, have tried both file types

To Reproduce (REQUIRED)

Input: Project: Koster Seafloor Observatory Workflow: KSO_SpeciesID(Advanced) / KSO_Tagging_new_species(Hardbottom) version: 1 type: Clip/Frame

pp.choose_zoo_workflows()

Output:

INFO:root:3,547 Zooniverse classifications have been retrieved from 3,543 subjects

---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
Cell In[22], line 1
----> 1 pp.process_zoo_classifications()

File ~/kso/kso_utils/project.py:787, in ProjectProcessor.process_zoo_classifications(self, test)
    783     workflow_checks = self.workflow_widget.checks
    785 # Retrieve a subset of the subjects from the workflows of interest and
    786 # populate the sql subjects table and flatten the classifications provided the cit. scientists
--> 787 self.processed_zoo_classifications = zoo_utils.process_zoo_classifications(
    788     project=self.project,
    789     server_connection=self.server_connection,
    790     db_connection=self.db_connection,
    791     workflow_widget_checks=workflow_checks,
    792     workflows_df=self.zoo_info["workflows"],
    793     subjects_df=self.zoo_info["subjects"],
    794     csv_paths=self.csv_paths,
    795     classifications_data=self.zoo_info["classifications"],
    796     subject_type=workflow_checks["Subject type: #0"],
    797 )

File ~/kso/kso_utils/zooniverse_utils.py:431, in process_zoo_classifications(project, server_connection, db_connection, workflow_widget_checks, workflows_df, subjects_df, csv_paths, classifications_data, subject_type)
    427 drop_table(conn=db_connection, table_name="subjects")
    429 if len(subjects_series) > 0:
    430     # Fill or re-fill subjects table
--> 431     populate_subjects(project, server_connection, db_connection, subjects_series)
    432 else:
    433     logging.error("No subjects to populate database from the workflows selected.")

File ~/kso/kso_utils/zooniverse_utils.py:1166, in populate_subjects(project, server_connection, db_connection, subjects)
   1162 subjects = subjects.rename(columns=rename_cols)
   1164 if "subject_type" in subjects.columns:
   1165     # Avoid having two subject_type columns (one from Zoo one from the db)
-> 1166     subjects["subject_type0"] = subjects["subject_type"].iloc[:, 0]
   1167     subjects["subject_type1"] = subjects["subject_type"].iloc[:, 1]
   1169     # Update with non-empty values

File ~/.local/lib/python3.10/site-packages/pandas/core/indexing.py:961, in _LocationIndexer.__getitem__(self, key)
    959     if self._is_scalar_access(key):
    960         return self.obj._get_value(*key, takeable=self._takeable)
--> 961     return self._getitem_tuple(key)
    962 else:
    963     # we by definition only have the 0th axis
    964     axis = self.axis or 0

File ~/.local/lib/python3.10/site-packages/pandas/core/indexing.py:1458, in _iLocIndexer._getitem_tuple(self, tup)
   1456 def _getitem_tuple(self, tup: tuple):
-> 1458     tup = self._validate_tuple_indexer(tup)
   1459     with suppress(IndexingError):
   1460         return self._getitem_lowerdim(tup)

File ~/.local/lib/python3.10/site-packages/pandas/core/indexing.py:765, in _LocationIndexer._validate_tuple_indexer(self, key)
    761 def _validate_tuple_indexer(self, key: tuple) -> tuple:
    762     """
    763     Check the key for valid keys across my indexer.
    764     """
--> 765     key = self._validate_key_length(key)
    766     key = self._expand_ellipsis(key)
    767     for i, k in enumerate(key):

File ~/.local/lib/python3.10/site-packages/pandas/core/indexing.py:812, in _LocationIndexer._validate_key_length(self, key)
    810             raise IndexingError(_one_ellipsis_message)
    811         return self._validate_key_length(key)
--> 812     raise IndexingError("Too many indexers")
    813 return key

IndexingError: Too many indexers

Expected behavior

A clear and concise description of what you expected to happen.

Environment

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.