orcasound / ambient-sound-analysis

This repository aims to hold code for UW MSDS capstone project analyzing ambient sounds in orcasound hydrophone data
MIT License
4 stars 4 forks source link

Duplicate Indices #53

Closed zprice12 closed 5 months ago

zprice12 commented 6 months ago

When producing parquet files for 3/22/23 11am-12pm I noticed duplicate indices of data for 11:50, note the data itself doesn't match. This is due to data from the 11:40-49 reaching into 11:50, rather than just right up to the 11:50 mark. I implemented a potential solution to this problem here https://github.com/orcasound/ambient-sound-analysis/blob/test_psd/src/orcasound_noise/pipeline/pipeline_3.py in the generate_psds function by removing the row of the previous dataframe if it has a matching last index with the first index of the incoming dataframe. This was meant as a temporary fix while we try to understand the underlying source of this problem

zprice12 commented 5 months ago

Added a simpler method for removing duplicate rows