Open scottveirs opened 3 years ago
Do you recall the (2019, Pod.Cast?) rationale @akashmjn ?
Perhaps I can weight in. In call data I have annotated, the 66th percentile is about 1.7 seconds. Hence 2.5 seconds would probably be enough to capture most calls in their entirety. Also, the model used here is a ResNet model, which requires squishing the spectrogram down into a square image (usually 224 by 224). Using the default NFFT window in the dataloader, that is about the amount of time you can get for ~200 frequency bands.
A good question was raised on a call with Canadian open source collaborators today (HALLO project, #ai4orcas-hallo in Orcasound Slack), some of whom have been experimenting with different window durations in developing a binary classifier for SRKW+Bigg's+NRKW+offshore ecotypes of killer whales in the NE Pacific (with habitat in BC, Canada, coastal environments):
Why did Pod.Cast and OrcaHello elect to use a 2.45 second window?
It would be ideal to recall the rationale and add it to the README.MD file.
On the call, I said I thought it was due to the statistics of SRKW call duration, but I'm not seeing the 2.45 second (or 2450 millisecond) value in Orcasound's shared spreadsheet of SRKW.