Past Competition Discussion Survey

1st Place Solution

Cornell Birdcall Identification
https://www.kaggle.com/c/birdsong-recognition/discussion/183208

メモ

Data Augmentation
- Pink noise
- Gaussian noise
- Gaussian SNR
- Gain (Volume Adjustment)
Models
- SED model had over 80 million parameters so I switched all my models to use a pretrained densenet121 model as the cnn feature extractor and reduced the attention block size to 1024
- I also replaced the clamp on the attention with tanh
Training
- Cosine Annealing Scheduler with warmup
- Mixup
- 50 epochs for non-mixup models and 100 epochs for mixup models
- AdamW with weight_decay 0.01
- SpecAugmentation enabled
- 30 second audio clips during training and evaluating on 2 30 second clips per audio.
I used a threshold of 0.3 on the framewise_output and 0.3 on the clipwise_output to reduce the impact of false positives.
inference I also applied 10 TTA by just adding the same audio sample 10 times in the batch and enabling Spec Augmentation.
inference 実装
repo

ぶらさがってるコメントなど

それらを受けてのアイデア、コメントなど

SED densenet121 backbone はやってみたい
TTA もやる

#TTA の実装部分

    for image in tensors:
        image = image.unsqueeze(0).unsqueeze(0)
        image = image.expand(image.shape[0], TTA, image.shape[2])
        image = image.to(device)

        with torch.no_grad():
            prediction = model((image, None))
            framewise_outputs = prediction["framewise_output"].detach(
                ).cpu().numpy()[0].mean(axis=0)
            clipwise_outputs = prediction["clipwise_output"].detach(
                ).cpu().numpy()[0].mean(axis=0)

osuossu8 / KaggleRFCX

Past Competition Discussion Survey #2

スレッドタイトル

メモ

ぶらさがってるコメントなど

それらを受けてのアイデア、コメントなど

1st Place Solution

メモ

ぶらさがってるコメントなど

それらを受けてのアイデア、コメントなど