kitzeslab / opensoundscape

Open source, scalable software for the analysis of bioacoustic recordings
http://opensoundscape.org
MIT License
136 stars 15 forks source link

generate_clip_times_df produces clip_df that does not match our standard format #1012

Closed louisfh closed 2 weeks ago

louisfh commented 3 months ago

The function generate_clip_df produces a dataframe that does not match our standard format of multi-index (file, start, end).

clip_df = generate_clip_times_df(3, clip_duration=1.0, clip_overlap=0.5)

More generally, I think we should think about a class that wraps the dataframes we use, that would enforce our standard format. It might be more opaque than just having a plain pandas.dataframe, but would avoid things like this.

sammlapp commented 2 weeks ago

The CategoricalLabels class (new in upcoming 0.11.0 release, currently in develop branch) fulfills this need to some extent, though we'll still sometimes use / allow dataframes with multi-index or with a specific set of columns.

The utility function generate_clip_times_df specifically says that it will produce clip_df: DataFrame with columns for 'start_time' and 'end_time' of each clip and can be used in contexts without an associated file path, so I think its current behavior is correct.