yardencsGitHub / tweetynet

Hybrid convolutional-recurrent neural networks for segmentation of birdsong and classification of elements
BSD 3-Clause "New" or "Revised" License
47 stars 9 forks source link

1.4 discuss choice of bin size #72

Closed NickleDave closed 2 years ago

NickleDave commented 3 years ago

"1.4. Relative influence of sequential versus local features. As I understand it, TN takes as input spectrograms in which each column of the spectrogram (corresponding to some small period of time) is a "bin", and a set of labels for each bin that are derived from how the corresponding segments of song were labeled by manual annotation. The network is then trained on "windows" That contain some large number (hundreds) of contiguous bins and their labels corresponding to many hundreds of milliseconds of song. I would appreciate some guidance regarding how performance depends on the relative influence of local acoustic structure (within a single bin, or within a single syllable) and more global sequential structure that depends on the specific sequence of syllables within-training windows. I assume that both of these will affect the labels that are applied to bins, but I have little intuition about their relative influences. Will this tend to work against the correct labeling of rare variants? For example if the sequence 'abc' is very prevalent in a song, and the sequence 'abb' is very rare, will the labeling of 'abb' be biased towards the labels 'abc'? More generally, it would be helpful to have a more pointed discussion of the degree to which TN performance depends on choices for parameters such as bin size and window size - is this something that a user might want to differently adjust for BF versus canary song versus other types of vocalizations?"

key point for this issue is that:

it would be helpful to have a more pointed discussion of the degree to which TN performance depends on choices for parameters such as bin size and window size

window size will be addressed by follow-up experiments, e.g. #69

but for bin size, as discussed with @yardencsGitHub, we will want to make clear that we are thinking of the model as implicitly learning to segment, and so bin size will impact its ability to segment well

heuristically: going below < 1 ms will probably give any additional benefits, and going > 5 ms will probably be too noisy

NickleDave commented 3 years ago

@yardencsGitHub maybe this is something we can better address with examples from canary song

Will this tend to work against the correct labeling of rare variants? For example if the sequence 'abc' is very prevalent in a song, and the sequence 'abb' is very rare, will the labeling of 'abb' be biased towards the labels 'abc'?

NickleDave commented 3 years ago

Based on most recent results and follow-up discussions with @yardencsGitHub we will want to say something like:
"generally speaking, we chose a bin size that was just smaller than the shortest duration silent gaps between syllables, because a larger bin size would have prevented our model from producing correct segments in cases where the true gaps were shorter than our bin size. In initial studies we experimented with even smaller bin sizes but found that the network tended to over-segment. We address possible reasons for this in the discussion."

NickleDave commented 2 years ago

I went ahead and added this sentence pretty much verbatim to "generating spectrograms" in Methods.

We should make sure to point this out in response to reviewers.

Closing as done.