ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Issue extracting frames from .mpg file in tutorial 4 #330

Closed ShrimpFather7 closed 7 months ago

ShrimpFather7 commented 7 months ago

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

🐛 Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input:

# Generate suitable frames for upload by modifying initial frames
pp.generate_custom_frames(
    input_path=input_folder.selected,
    output_path=output_folder.selected,
    skip_start=1,
    skip_end=40,
    num_frames=50,
    frames_skip=None,
)

Output:

AttributeError                            Traceback (most recent call last)
File /usr/src/app/kso-dev/kso_utils/project.py:973, in ProjectProcessor.generate_custom_frames.<locals>.on_button_clicked(b)
    969 def on_button_clicked(b):
    970     movie_files = sorted(
    971         [
    972             f
--> 973             for f in input_path.iterdir()
    974             if f.is_file()
    975             and f.suffix.lower() in [".mov", ".mp4", ".avi", ".mkv", ".mpg"]
    976         ]
    977     )
    979     results = g_utils.parallel_map(
    980         kso_widgets.extract_custom_frames,
    981         movie_files,
   (...)
    989         ),
    990     )
    991     if len(results) > 0:

AttributeError: 'str' object has no attribute 'iterdir'

Expected behavior

A clear and concise description of what you expected to happen.

Environment

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

ShrimpFather7 commented 7 months ago

@jannesgg

jannesgg commented 7 months ago

@ShrimpFather7 On my side, this was fixed 2 days ago as part of #327. Could you make sure you are on the dev branch and also do a git pull just to be safe?

ShrimpFather7 commented 7 months ago

Git pull removed that error. Now I'm getting this one however:

KeyError Traceback (most recent call last) File /usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance) 3620 try: -> 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err:

File /usr/local/lib/python3.8/dist-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File /usr/local/lib/python3.8/dist-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'frame_path'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) File /usr/src/app/kso-dev/kso_utils/project.py:1022, in ProjectProcessor.generate_custom_frames..on_button_clicked(b) 1020 self.frames_to_upload_df = pd.DataFrame() 1021 self.project.output_path = output_path -> 1022 self.generated_frames = zoo_utils.modify_frames( 1023 project=self.project, 1024 frames_to_upload_df=self.frames_to_upload_df.reset_index(drop=True), 1025 species_i=species_list.value, 1026 modification_details=frame_modification.checks, 1027 ) 1028 self.modified_frames = self.generated_frames

File /usr/src/app/kso-dev/kso_utils/zooniverse_utils.py:1724, in modify_frames(project, frames_to_upload_df, species_i, modification_details) 1721 mod_frames_folder = project.output_path + mod_frames_folder 1723 # Specify the path of the modified frames -> 1724 frames_to_upload_df["modif_frame_path"] = frames_to_upload_df["frame_path"].apply( 1725 lambda x: str(Path(mod_frames_folder, Path(x).name)), 1 1726 ) 1728 # Remove existing modified clips 1729 if os.path.exists(mod_frames_folder):

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:3506, in DataFrame.getitem(self, key) 3504 if self.columns.nlevels > 1: 3505 return self._getitem_multilevel(key) -> 3506 indexer = self.columns.get_loc(key) 3507 if is_integer(indexer): 3508 indexer = [indexer]

File /usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance) 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err: -> 3623 raise KeyError(key) from err 3624 except TypeError: 3625 # If we have a listlike key, _check_indexing_error will raise 3626 # InvalidIndexError. Otherwise we fall through and re-raise 3627 # the TypeError. 3628 self._check_indexing_error(key)

KeyError: 'frame_path'

jannesgg commented 7 months ago

@ShrimpFather7 Could you try running it again. I think the files should have an mp4 extension but keep the original mpg extension for some reason. I have renamed them manually to test this. Please have a look and see if this helps.

ShrimpFather7 commented 7 months ago

@jannesgg When running through the tutorial again I've been running through issues with compressing the frames. I get the following issue:

File :1 ffmpeg.input('/mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_shallow_frames_CN/video-HD_CAM_1-2022-07-15T08-54-30-401_frame_7923.jpg').output('/mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/frames/modified_Dead man's fingers_frames/video-HD_CAM_1-2022-07-15T08-54-30-401_frame_7923.jpg', q=25) ^ SyntaxError: invalid syntax

I'm not sure if this is related to the issue you were talking about or if it's a new one. But this has gotten in the way of me reaching the "Upload frames to zooniverse" cell. When I run that cell however, I reach this error:

IntCastingNaNError Traceback (most recent call last) Cell In[18], line 1 ----> 1 pp.upload_zoo_subjects("frame")

File /usr/src/app/kso-dev/kso_utils/project.py:781, in ProjectProcessor.upload_zoo_subjects(self, subject_type) 778 logging.info(f"Clips temporarily stored locally has been removed") 780 elif subject_type == "frame": --> 781 upload_df = zoo_utils.set_zoo_frame_metadata( 782 project=self.project, 783 db_connection=self.db_connection, 784 df=self.generated_frames, 785 species_list=self.species_of_interest, 786 csv_paths=self.csv_paths, 787 ) 788 zoo_utils.upload_frames_to_zooniverse( 789 project=self.project, 790 upload_to_zoo=upload_df, 791 species_list=self.species_of_interest, 792 ) 794 else:

File /usr/src/app/kso-dev/kso_utils/zooniverse_utils.py:1826, in set_zoo_frame_metadata(project, db_connection, df, species_list, csv_paths) 1824 # Set project-specific metadata 1825 if project.Zooniverse_number == 9747: -> 1826 df = add_db_info_to_df( 1827 project, db_connection, csv_paths, df, "sites", "id, siteName" 1828 ) 1829 upload_to_zoo = df[ 1830 [ 1831 "frame_path", (...) 1837 ] 1838 ] 1840 elif project_name == "SGU":

File /usr/src/app/kso-dev/kso_utils/db_utils.py:515, in add_db_info_to_df(project, conn, csv_paths, df, table_name, cols_interest) 513 # Ensure id columns that are going to be used to merge are int 514 if "id" in left_on_col: --> 515 df[left_on_col] = df[left_on_col].astype(float).astype(int) 517 # Combine the original and sqldf dfs 518 comb_df = pd.merge( 519 df, sql_df, how="left", left_on=left_on_col, right_on=right_on_col 520 )

File /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:5920, in NDFrame.astype(self, dtype, copy, errors) 5913 results = [ 5914 self.iloc[:, i].astype(dtype, copy=copy) 5915 for i in range(len(self.columns)) 5916 ] 5918 else: 5919 # else, only a single dtype is given -> 5920 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) 5921 return self._constructor(new_data).finalize(self, method="astype") 5923 # GH 33113: handle empty frame or series

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:419, in BaseBlockManager.astype(self, dtype, copy, errors) 418 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T: --> 419 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:304, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, kwargs) 302 applied = b.apply(f, kwargs) 303 else: --> 304 applied = getattr(b, f)(**kwargs) 305 except (TypeError, NotImplementedError): 306 if not ignore_failures:

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py:582, in Block.astype(self, dtype, copy, errors) 564 """ 565 Coerce to the new dtype. 566 (...) 578 Block 579 """ 580 values = self.values --> 582 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) 584 new_values = maybe_coerce_values(new_values) 585 newb = self.make_block(new_values)

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1292, in astype_array_safe(values, dtype, copy, errors) 1289 dtype = dtype.numpy_dtype 1291 try: -> 1292 new_values = astype_array(values, dtype, copy=copy) 1293 except (ValueError, TypeError): 1294 # e.g. astype_nansafe can fail on object-dtype of strings 1295 # trying to convert to float 1296 if errors == "ignore":

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1237, in astype_array(values, dtype, copy) 1234 values = values.astype(dtype, copy=copy) 1236 else: -> 1237 values = astype_nansafe(values, dtype, copy=copy) 1239 # in pandas we don't store numpy str dtypes, so convert to object 1240 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1148, in astype_nansafe(arr, dtype, copy, skipna) 1145 raise TypeError(f"cannot astype a timedelta from [{arr.dtype}] to [{dtype}]") 1147 elif np.issubdtype(arr.dtype, np.floating) and np.issubdtype(dtype, np.integer): -> 1148 return astype_float_to_int_nansafe(arr, dtype, copy) 1150 elif is_object_dtype(arr.dtype): 1151 1152 # work around NumPy brokenness, #1987 1153 if np.issubdtype(dtype.type, np.integer):

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1193, in astype_float_to_int_nansafe(values, dtype, copy) 1189 """ 1190 astype with a check preventing converting NaN to an meaningless integer value. 1191 """ 1192 if not np.isfinite(values).all(): -> 1193 raise IntCastingNaNError( 1194 "Cannot convert non-finite values (NA or inf) to integer" 1195 ) 1196 return values.astype(dtype, copy=copy)

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

jannesgg commented 7 months ago

@ShrimpFather7 This issue is probably due to the fact that there is a quote in the folder name "modified_Dead man's" which creates a problem for the string creation. I will see if I can solve this somehow and update here if I find a solution.

ShrimpFather7 commented 7 months ago

@jannesgg The first error was resolved by changing the species name, thanks! Hope it goes well with the other issue :)

jannesgg commented 7 months ago

@ShrimpFather7 I have made a quick fix now, please try git pull and let me know how it goes.

ShrimpFather7 commented 7 months ago

Tried a git pull and ran in kso-dev. Still seems like the same error.

IntCastingNaNError Traceback (most recent call last) Cell In[13], line 1 ----> 1 pp.upload_zoo_subjects("frame")

File /usr/src/app/kso-dev/kso_utils/project.py:781, in ProjectProcessor.upload_zoo_subjects(self, subject_type) 778 logging.info(f"Clips temporarily stored locally has been removed") 780 elif subject_type == "frame": --> 781 upload_df = zoo_utils.set_zoo_frame_metadata( 782 project=self.project, 783 db_connection=self.db_connection, 784 df=self.generated_frames, 785 species_list=self.species_of_interest, 786 csv_paths=self.csv_paths, 787 ) 788 zoo_utils.upload_frames_to_zooniverse( 789 project=self.project, 790 upload_to_zoo=upload_df, 791 species_list=self.species_of_interest, 792 ) 794 else:

File /usr/src/app/kso-dev/kso_utils/zooniverse_utils.py:1826, in set_zoo_frame_metadata(project, db_connection, df, species_list, csv_paths) 1824 # Set project-specific metadata 1825 if project.Zooniverse_number == 9747: -> 1826 df = add_db_info_to_df( 1827 project, db_connection, csv_paths, df, "sites", "id, siteName" 1828 ) 1829 upload_to_zoo = df[ 1830 [ 1831 "frame_path", (...) 1837 ] 1838 ] 1840 elif project_name == "SGU":

File /usr/src/app/kso-dev/kso_utils/db_utils.py:515, in add_db_info_to_df(project, conn, csv_paths, df, table_name, cols_interest) 513 # Ensure id columns that are going to be used to merge are int 514 if "id" in left_on_col: --> 515 df[left_on_col] = df[left_on_col].astype(float).astype(int) 517 # Combine the original and sqldf dfs 518 comb_df = pd.merge( 519 df, sql_df, how="left", left_on=left_on_col, right_on=right_on_col 520 )

File /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:5920, in NDFrame.astype(self, dtype, copy, errors) 5913 results = [ 5914 self.iloc[:, i].astype(dtype, copy=copy) 5915 for i in range(len(self.columns)) 5916 ] 5918 else: 5919 # else, only a single dtype is given -> 5920 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) 5921 return self._constructor(new_data).finalize(self, method="astype") 5923 # GH 33113: handle empty frame or series

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:419, in BaseBlockManager.astype(self, dtype, copy, errors) 418 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T: --> 419 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:304, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, kwargs) 302 applied = b.apply(f, kwargs) 303 else: --> 304 applied = getattr(b, f)(**kwargs) 305 except (TypeError, NotImplementedError): 306 if not ignore_failures:

File /usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py:582, in Block.astype(self, dtype, copy, errors) 564 """ 565 Coerce to the new dtype. 566 (...) 578 Block 579 """ 580 values = self.values --> 582 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) 584 new_values = maybe_coerce_values(new_values) 585 newb = self.make_block(new_values)

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1292, in astype_array_safe(values, dtype, copy, errors) 1289 dtype = dtype.numpy_dtype 1291 try: -> 1292 new_values = astype_array(values, dtype, copy=copy) 1293 except (ValueError, TypeError): 1294 # e.g. astype_nansafe can fail on object-dtype of strings 1295 # trying to convert to float 1296 if errors == "ignore":

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1237, in astype_array(values, dtype, copy) 1234 values = values.astype(dtype, copy=copy) 1236 else: -> 1237 values = astype_nansafe(values, dtype, copy=copy) 1239 # in pandas we don't store numpy str dtypes, so convert to object 1240 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1148, in astype_nansafe(arr, dtype, copy, skipna) 1145 raise TypeError(f"cannot astype a timedelta from [{arr.dtype}] to [{dtype}]") 1147 elif np.issubdtype(arr.dtype, np.floating) and np.issubdtype(dtype, np.integer): -> 1148 return astype_float_to_int_nansafe(arr, dtype, copy) 1150 elif is_object_dtype(arr.dtype): 1151 1152 # work around NumPy brokenness, #1987 1153 if np.issubdtype(dtype.type, np.integer):

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/cast.py:1193, in astype_float_to_int_nansafe(values, dtype, copy) 1189 """ 1190 astype with a check preventing converting NaN to an meaningless integer value. 1191 """ 1192 if not np.isfinite(values).all(): -> 1193 raise IntCastingNaNError( 1194 "Cannot convert non-finite values (NA or inf) to integer" 1195 ) 1196 return values.astype(dtype, copy=copy)

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

jannesgg commented 7 months ago

See e-mail. Closing issue for now. Re-open if necessary.