yardencsGitHub / tweetynet

Hybrid convolutional-recurrent neural networks for segmentation of birdsong and classification of elements
BSD 3-Clause "New" or "Revised" License
47 stars 9 forks source link

1.2a add details regarding choice of pooling operation #65

Closed NickleDave closed 2 years ago

NickleDave commented 3 years ago

1.2. Given what we know from ASR, the pooling step is likely very important for reducing unwanted input variance. Thus, more details are important here regarding the choice of the max operation, and a performance comparison to other potential types of pooling that justifies the choice. In particular, it is known that max pooling is particularly susceptible to over fitting (Goodfellow et al, 2013), and so some discussion of whether the likely gains that are provided by max pooling are worth the potential costs is warranted. Relatedly, the dimensions of the pooling operation are not clear. In speech recognition it is common to see pooling in frequency (but not time), whereas in image recognition pooling in both spatial dimensions is common. The exact form should be clarified, and (if possible) justified.

check literature for maxpool v global + average

NickleDave commented 3 years ago

should check ASR lit specifically

NickleDave commented 3 years ago

@yardencsGitHub looking more at relevant lit.

This paper from MARBL describes adaptive pooling operators for weakly-supervised sound event detection https://arxiv.org/pdf/1804.10070.pdf We are not in a weakly-supervised setting but their "related work" seems like a good place to start

Noticing they cite another paper from Parascandolo (we cite their BiLSTM paper) https://arxiv.org/pdf/1702.06286.pdf We should definitely cite this and you and I should discuss the experiments within together. Note they propose a "frequency max pooling" layer. I think we in effect have something similar by virtue of the fact that our pooling size is (8, 1) and so is our stride

so I think a simple experiment to do would be to change the pooling size so it includes time domain, e.g., use filters of size (8, 8). I predict this would impair accuracy and thus demonstrate (in a post-hoc way :innocent: :grimacing: ) why we chose this pooling operation

if you agree I can add "change pooling size" issues with experiment labels

yardencsGitHub commented 3 years ago

The pooling step, as implemented in TensorFlow, reduced the chosen dimensions - effectively losing resolution. So, a choice of 1 temporal bin was made to avoid losing temporal resolution. This probably can be changed via setting the step.

This adds to the set of experiments we want to run. We should first make a pilot with BF to make sure it is merited.

NickleDave commented 3 years ago

@yardencsGitHub I have added language in the "proposed method" section of the introduction as well as in the Methods section that provides details about the pooling operation and cites relevant literature

I'm leaving this open for now because I think we could modify the diagram if possible to make the pooling shape / stride explicit

NickleDave commented 2 years ago

Revised language about this and summed up in response letter. I still think we could revise the figure to better show this but am closing this issue for now.