"1.4. Relative influence of sequential versus local features. As I understand it, TN takes as input spectrograms in which each column of the spectrogram (corresponding to some small period of time) is a "bin", and a set of labels for each bin that are derived from how the corresponding segments of song were labeled by manual annotation. The network is then trained on "windows" That contain some large number (hundreds) of contiguous bins and their labels corresponding to many hundreds of milliseconds of song. I would appreciate some guidance regarding how performance depends on the relative influence of local acoustic structure (within a single bin, or within a single syllable) and more global sequential structure that depends on the specific sequence of syllables within-training windows. I assume that both of these will affect the labels that are applied to bins, but I have little intuition about their relative influences. Will this tend to work against the correct labeling of rare variants? For example if the sequence 'abc' is very prevalent in a song, and the sequence 'abb' is very rare, will the labeling of 'abb' be biased towards the labels 'abc'? More generally, it would be helpful to have a more pointed discussion of the degree to which TN performance depends on choices for parameters such as bin size and window size - is this something that a user might want to differently adjust for BF versus canary song versus other types of vocalizations?"
[ ] we can vary hidden size to get at importance of local features + and window size to get at importance of global sequential structure within training windows
vary hidden size of github issue: #68
follow-up experiments on window size, with half the size + double the size #69
results so far: suggest that window size really matters for segment error rate, not for frame error rate --> implies that network seeing global structure is what really matters
[ ] re: rare variants
@yardencsGitHub will show examples of rare variants from canary with probabilities + loss
and show how that varies with larger windows
[ ] in discussion: future work can incorporate segment error rate into loss term, possibly reducing dependency of performance on window size