ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Issue with uploading frames in tut 4, same error type as issue 284 #346

Closed Bergylta closed 5 months ago

Bergylta commented 5 months ago

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

šŸ› Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

project: GU Frames location:/mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/GU_gobies_round2_testing/baitboxframes Input:

pp.upload_zoo_subjects("frame")

Output:

---------------------------------------------------------------------------
IntCastingNaNError                        Traceback (most recent call last)
Cell In[18], line 1
----> 1 pp.upload_zoo_subjects("frame")

File /usr/src/app/kso-dev/kso_utils/project.py:794, in ProjectProcessor.upload_zoo_subjects(self, subject_type)
    791     logging.info(f"Clips temporarily stored locally has been removed")
    793 elif subject_type == "frame":
--> 794     upload_df = zoo_utils.set_zoo_frame_metadata(
    795         project=self.project,
    796         db_connection=self.db_connection,
    797         df=self.generated_frames,
    798         species_list=self.species_of_interest,
    799         csv_paths=self.csv_paths,
    800     )
    801     zoo_utils.upload_frames_to_zooniverse(
    802         project=self.project,
    803         upload_to_zoo=upload_df,
    804         species_list=self.species_of_interest,
    805     )
    807 else:

File /usr/src/app/kso-dev/kso_utils/zooniverse_utils.py:1838, in set_zoo_frame_metadata(project, db_connection, df, species_list, csv_paths)
   1836 # Set project-specific metadata
   1837 if project.Zooniverse_number == 9747:
-> 1838     df = add_db_info_to_df(
   1839         project, db_connection, csv_paths, df, "sites", "id, siteName"
   1840     )
   1841     upload_to_zoo = df[
   1842         [
   1843             "frame_path",
   (...)
   1849         ]
   1850     ]
   1852 elif project_name == "SGU":

File /usr/src/app/kso-dev/kso_utils/db_utils.py:516, in add_db_info_to_df(project, conn, csv_paths, df, table_name, cols_interest)
    514 # Ensure id columns that are going to be used to merge are int
    515 if "id" in left_on_col:
--> 516     df[left_on_col] = df[left_on_col].astype(float).astype(int)
    518 # Combine the original and sqldf dfs
    519 comb_df = pd.merge(
    520     df, sql_df, how="left", left_on=left_on_col, right_on=right_on_col
    521 )

File /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:5920, in NDFrame.astype(self, dtype, copy, errors)
   5913     results = [
   5914         self.iloc[:, i].astype(dtype, copy=copy)
   5915         for i in range(len(self.columns))
   5916     ]
   5918 else:
   5919     # else, only a single dtype is given
-> 5920     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   5921     return self._constructor(new_data).__finalize__(self, method="astype")
   5923 # GH 33113: handle empty frame or series

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:419, in BaseBlockManager.astype(self, dtype, copy, errors)
    418 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 419     return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:304, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)
    302         applied = b.apply(f, **kwargs)
    303     else:
--> 304         applied = getattr(b, f)(**kwargs)
    305 except (TypeError, NotImplementedError):
    306     if not ignore_failures:

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py:582, in Block.astype(self, dtype, copy, errors)
    564 """
    565 Coerce to the new dtype.
    566 
   (...)
    578 Block
    579 """
    580 values = self.values
--> 582 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    584 new_values = maybe_coerce_values(new_values)
    585 newb = self.make_block(new_values)

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1292, in astype_array_safe(values, dtype, copy, errors)
   1289     dtype = dtype.numpy_dtype
   1291 try:
-> 1292     new_values = astype_array(values, dtype, copy=copy)
   1293 except (ValueError, TypeError):
   1294     # e.g. astype_nansafe can fail on object-dtype of strings
   1295     #  trying to convert to float
   1296     if errors == "ignore":

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1237, in astype_array(values, dtype, copy)
   1234     values = values.astype(dtype, copy=copy)
   1236 else:
-> 1237     values = astype_nansafe(values, dtype, copy=copy)
   1239 # in pandas we don't store numpy str dtypes, so convert to object
   1240 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1148, in astype_nansafe(arr, dtype, copy, skipna)
   1145     raise TypeError(f"cannot astype a timedelta from [{arr.dtype}] to [{dtype}]")
   1147 elif np.issubdtype(arr.dtype, np.floating) and np.issubdtype(dtype, np.integer):
-> 1148     return astype_float_to_int_nansafe(arr, dtype, copy)
   1150 elif is_object_dtype(arr.dtype):
   1151 
   1152     # work around NumPy brokenness, #1987
   1153     if np.issubdtype(dtype.type, np.integer):

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1193, in astype_float_to_int_nansafe(values, dtype, copy)
   1189 """
   1190 astype with a check preventing converting NaN to an meaningless integer value.
   1191 """
   1192 if not np.isfinite(values).all():
-> 1193     raise IntCastingNaNError(
   1194         "Cannot convert non-finite values (NA or inf) to integer"
   1195     )
   1196 return values.astype(dtype, copy=copy)

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

Expected behavior

Environment

If applicable, add screenshots to help explain your problem.

Additional context

A similar issue with the same error code was encountered last year in august in tut 4

jannesgg commented 5 months ago

@Bergylta Issue probably relates to movies not being added to the movies.csv for the given project, which means that certain columns in pp.generated_frames are NaN.

In future, a better error message should be added to avoid confusion. We believe it is similar to #284.