yardencsGitHub / tweetynet

Hybrid convolutional-recurrent neural networks for segmentation of birdsong and classification of elements
BSD 3-Clause "New" or "Revised" License
47 stars 9 forks source link

1.4 vary size of hidden state in LSTM for BFs #68

Closed NickleDave closed 3 years ago

NickleDave commented 3 years ago

1.4. Relative influence of sequential versus local features. As I understand it, TN takes as input spectrograms in which each column of the spectrogram (corresponding to some small period of time) is a "bin", and a set of labels for each bin that are derived from how the corresponding segments of song were labeled by manual annotation. The network is then trained on "windows" That contain some large number (hundreds) of contiguous bins and their labels corresponding to many hundreds of milliseconds of song. I would appreciate some guidance regarding how performance depends on the relative influence of local acoustic structure (within a single bin, or within a single syllable) and more global sequential structure that depends on the specific sequence of syllables within-training windows. I assume that both of these will affect the labels that are applied to bins, but I have little intuition about their relative influences. Will this tend to work against the correct labeling of rare variants? For example if the sequence 'abc' is very prevalent in a song, and the sequence 'abb' is very rare, will the labeling of 'abb' be biased towards the labels 'abc'? More generally, it would be helpful to have a more pointed discussion of the degree to which TN performance depends on choices for parameters such as bin size and window size - is this something that a user might want to differently adjust for BF versus canary song versus other types of vocalizations?

NickleDave commented 3 years ago

instead of "ablating" LSTM, run experiments where we vary size of hidden state, i.e. number of hidden units, after fixing #70

this will be one way of getting at "Relative influence of sequential versus local features" as described above in 1.4, although reviewer is asking about bin size v window size

NickleDave commented 3 years ago

based on email discussions with @yardencsGitHub after further analysis of results in initial submission:

yardencsGitHub commented 3 years ago

Depending on results we might want to ablate too

NickleDave commented 3 years ago

correction: after #75 and #76 realize that we used hidden size of 256 for initial submission. 1024 is "hidden size * 4", one dimension of the size of the learnable input-hidden weights i_h -- see LSTM variables in PyTorch docs: https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

NickleDave commented 3 years ago

Will need to re-run this