osuossu8 / kaggle-solution

8 stars 0 forks source link

[2020] Cornell Birdcall Identification #2

Open osuossu8 opened 3 years ago

osuossu8 commented 3 years ago

Competition link

https://www.kaggle.com/c/birdsong-recognition

Evaluation

row-wise micro averaged F1 score

top10 solutions

1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

Other (if any)

submission format

row_id,birds
site_1_0a997dff022e3ad9744d4e7bbf923288_5,amecro
site_1_0a997dff022e3ad9744d4e7bbf923288_10,amecro amerob
site_1_0a997dff022e3ad9744d4e7bbf923288_15,nocall
osuossu8 commented 3 years ago

1st place

Data Augmentation

No external data.

Models

4 fold models without mixup 4 fold models with mixup 5 fold models without mixup

Training

class SedScaledPosNegFocalLoss(nn.Module):
    def __init__(self, gamma=0.0, alpha_1=1.0, alpha_0=1.0, secondary_factor=1.0):
        super().__init__()

        self.loss_fn = nn.BCELoss(reduction='none')
        self.secondary_factor = secondary_factor
        self.gamma = gamma
        self.alpha_1 = alpha_1
        self.alpha_0 = alpha_0
        self.loss_keys = ["bce_loss", "F_loss", "FScaled_loss", "F_loss_0", "F_loss_1"]

    def forward(self, y_pred, y_target):
        y_true = y_target["all_labels"]
        y_sec_true = y_target["secondary_labels"]
        bs, s, o = y_true.shape

        # Sigmoid has already been applied in the model
        y_pred = torch.clamp(y_pred, min=EPSILON_FP16, max=1.0-EPSILON_FP16)
        y_pred = y_pred.reshape(bs*s,o)
        y_true = y_true.reshape(bs*s,o)
        y_sec_true = y_sec_true.reshape(bs*s,o)

        with torch.no_grad():
            y_all_ones_mask = torch.ones_like(y_true, requires_grad=False)
            y_all_zeros_mask = torch.zeros_like(y_true, requires_grad=False)
            y_all_mask = torch.where(y_true > 0.0, y_all_ones_mask, y_all_zeros_mask)
            y_ones_mask = torch.ones_like(y_sec_true, requires_grad=False)
            y_zeros_mask = torch.ones_like(y_sec_true, requires_grad=False) *self.secondary_factor
            y_secondary_mask = torch.where(y_sec_true > 0.0, y_zeros_mask, y_ones_mask)
        bce_loss = self.loss_fn(y_pred, y_true)
        pt = torch.exp(-bce_loss)
        F_loss_0 = (self.alpha_0*(1-y_all_mask)) * (1-pt)**self.gamma * bce_loss
        F_loss_1 = (self.alpha_1*y_all_mask) * (1-pt)**self.gamma * bce_loss

        F_loss = F_loss_0 + F_loss_1

        FScaled_loss = y_secondary_mask*F_loss
        FScaled_loss = FScaled_loss.mean()

        return FScaled_loss, {"bce_loss": bce_loss.mean(), "F_loss_1": F_loss_1.mean(), "F_loss_0": F_loss_0.mean(), "F_loss": F_loss.mean(), "FScaled_loss": FScaled_loss }

Thresholds

CV vs LB

Ensemble

osuossu8 commented 3 years ago

2nd place

nb https://www.kaggle.com/vlomme/surfin-bird-2nd-place

git https://github.com/vlomme/Birdcall-Identification-competition/blob/master/train.py

osuossu8 commented 3 years ago

3rd place

https://www.kaggle.com/c/birdsong-recognition/discussion/183199

nb : https://www.kaggle.com/theoviel/training-a-winning-model?scriptVersionId=42814701

git : https://github.com/TheoViel/kaggle_birdcall_identification

Data augmentation is the key to reduce the discrepancy between train and test. We start by randomly cropping 5 seconds of the audio and then add aggressive noise augmentations :

Gaussian noise
With a soud to noise ratio up to 0.5

Background noise
We randomly chose 5 seconds of a sample in the background dataset available here. This dataset contains samples without bircall from the example test audios from the competition data, and some samples from the freesound bird detection challenge that were manually selected.

Modified Mixup
Mixup creates a combination of a batch x1 and its shuffled version x2 : x = a * x1 + (1 - a) * x2 where a is samples with a beta distribution.
Then, instead of using the classical objective for mixup, we define the target associated to x as the union of the original targets.
This forces the model to correctly predict both labels.
Mixup is applied with probability 0.5 and I used 5 as parameter for the beta disctribution, which forces a to be close to 0.5.

Improved cropping
Instead of randomly selecting the crops, selecting them based on out-of-fold confidence was also used. The confidence at time t is the probability of the ground truth class predicted on the 5 second crop starting from t.