lifewatch / pypam

Python Passive Acoustic Analysis tool for Passive Acoustic Monitoring (PAM)
GNU General Public License v3.0
41 stars 8 forks source link

Issues using non-zero bin-overlaps #127

Open caplinje-NOAA opened 5 months ago

caplinje-NOAA commented 5 months ago

I've run into an issue using bin overlaps in file class methods which call _bins, for example if attempting to do this:

file = pypam.AcuFile(sfile, AMAR, 1.)
rms1s = file.rms(binsize=1.0,bin_overlap=0.1)

In this case (i.e. using float bin_overlaps between 0 and 1), soundfile.blocks throws a type error:

TypeError: slice indices must be integers or None or have an __index__ method

I believe that maybe in the sf.blocks call in the _bins method (below), overlap should be set to 'noverlap', rather than 'bin_overlap', as the latter is the integer/sample representation of the fractional overlap.

for i, block in tqdm(enumerate(sf.blocks(self.file_path, blocksize=blocksize, start=self._start_frame,
                                         overlap=bin_overlap, always_2d=True, fill_value=0.0)),
                     total=n_blocks, leave=False, position=0):

Doing this solves the error (or at least the file.rms call doesn't error), however, the resulting dataset does not appear to show start_sample and end_sample values which I would expect, if there were some overlap. This is the resulting dataset if I replace bin_overlap with noverlap:

<xarray.Dataset>
Dimensions:       (id: 67, band: 1)
Coordinates:
  * id            (id) int32 0 1 2 3 4 5 6 7 8 9 ... 58 59 60 61 62 63 64 65 66
    datetime      (id) datetime64[ns] 2024-07-02T14:48:30.706957 ... 2024-07-...
    start_sample  (id) int32 0 64000 128000 192000 ... 4096000 4160000 4224000
    end_sample    (id) int32 64000 128000 192000 ... 4160000 4224000 4288000
  * band          (band) int32 0
    low_freq      (band) int32 0
    high_freq     (band) float64 3.2e+04
Data variables:
    rms           (id, band) float64 135.8 114.8 135.3 ... 137.0 113.5 137.3
Attributes: (12/14)
    file_path:               data/exfile.wav
    timezone:                UTC
    datetime_timezone:       UTC
    p_ref:                   1.0
    channel:                 0
    _start_frame:            0
                     ...
    fs:                      64000
    hydrophone_name:         AMAR 1
    hydrophone_model:        Geospectrum
    hydrophone_sensitivity:  -166.6
    hydrophone_preamp_gain:  0
    hydrophone_Vpp:          1

I appreciate any help here, and have really been enjoying learning this package.

cparcerisas commented 2 months ago

Hello! This should be fixed in the new branch feature/gridded_data I am running some extra testing and will merge the pull request as soon as I am happy with it. If you want to test that branch already that would be helpful!

ps. sorry for the delay in fixing this, I was taking a long break after my phd

caplinje-NOAA commented 2 months ago

Thank you and no worry about the delay. I will test this out and compare it to my fix.

cparcerisas commented 2 months ago

Yeah I changed more things than that, I hope that is not a problem. It was a long time that I wanted to offer support for "gridded" data (each bin starts at second 0). But there was indeed a mistake in the code for non-zero windows. Now the _n_blocks _bins function have been changed, and should be correct. If you could check it I would really appreciate!

caplinje-NOAA commented 3 weeks ago

I have committed my solution to a clean fork of your project here. I realize that you are addressing this in a different way (and entirely refactoring the _bins method), so I thought I would ask before I submitted a pull request. Also, apologies for not testing that branch yet, I am in the middle of some analysis and want to stick with what is working.

The fix is small but mostly is just replacing bin_overlap with noverlap in the sf.blocks call, but also corrects the time keeping.