Open osuossu8 opened 3 years ago
No external data.
I noticed that the default SED model had over 80 million parameters so I switched all my models to use a pretrained densenet121 model as the cnn feature extractor and reduced the attention block size to 1024.
4 fold models without mixup 4 fold models with mixup 5 fold models without mixup
class SedScaledPosNegFocalLoss(nn.Module):
def __init__(self, gamma=0.0, alpha_1=1.0, alpha_0=1.0, secondary_factor=1.0):
super().__init__()
self.loss_fn = nn.BCELoss(reduction='none')
self.secondary_factor = secondary_factor
self.gamma = gamma
self.alpha_1 = alpha_1
self.alpha_0 = alpha_0
self.loss_keys = ["bce_loss", "F_loss", "FScaled_loss", "F_loss_0", "F_loss_1"]
def forward(self, y_pred, y_target):
y_true = y_target["all_labels"]
y_sec_true = y_target["secondary_labels"]
bs, s, o = y_true.shape
# Sigmoid has already been applied in the model
y_pred = torch.clamp(y_pred, min=EPSILON_FP16, max=1.0-EPSILON_FP16)
y_pred = y_pred.reshape(bs*s,o)
y_true = y_true.reshape(bs*s,o)
y_sec_true = y_sec_true.reshape(bs*s,o)
with torch.no_grad():
y_all_ones_mask = torch.ones_like(y_true, requires_grad=False)
y_all_zeros_mask = torch.zeros_like(y_true, requires_grad=False)
y_all_mask = torch.where(y_true > 0.0, y_all_ones_mask, y_all_zeros_mask)
y_ones_mask = torch.ones_like(y_sec_true, requires_grad=False)
y_zeros_mask = torch.ones_like(y_sec_true, requires_grad=False) *self.secondary_factor
y_secondary_mask = torch.where(y_sec_true > 0.0, y_zeros_mask, y_ones_mask)
bce_loss = self.loss_fn(y_pred, y_true)
pt = torch.exp(-bce_loss)
F_loss_0 = (self.alpha_0*(1-y_all_mask)) * (1-pt)**self.gamma * bce_loss
F_loss_1 = (self.alpha_1*y_all_mask) * (1-pt)**self.gamma * bce_loss
F_loss = F_loss_0 + F_loss_1
FScaled_loss = y_secondary_mask*F_loss
FScaled_loss = FScaled_loss.mean()
return FScaled_loss, {"bce_loss": bce_loss.mean(), "F_loss_1": F_loss_1.mean(), "F_loss_0": F_loss_0.mean(), "F_loss": F_loss.mean(), "FScaled_loss": FScaled_loss }
nb https://www.kaggle.com/vlomme/surfin-bird-2nd-place
git https://github.com/vlomme/Birdcall-Identification-competition/blob/master/train.py
https://www.kaggle.com/c/birdsong-recognition/discussion/183199
nb : https://www.kaggle.com/theoviel/training-a-winning-model?scriptVersionId=42814701
git : https://github.com/TheoViel/kaggle_birdcall_identification
Data augmentation is the key to reduce the discrepancy between train and test. We start by randomly cropping 5 seconds of the audio and then add aggressive noise augmentations :
Gaussian noise
With a soud to noise ratio up to 0.5
Background noise
We randomly chose 5 seconds of a sample in the background dataset available here. This dataset contains samples without bircall from the example test audios from the competition data, and some samples from the freesound bird detection challenge that were manually selected.
Modified Mixup
Mixup creates a combination of a batch x1 and its shuffled version x2 : x = a * x1 + (1 - a) * x2 where a is samples with a beta distribution.
Then, instead of using the classical objective for mixup, we define the target associated to x as the union of the original targets.
This forces the model to correctly predict both labels.
Mixup is applied with probability 0.5 and I used 5 as parameter for the beta disctribution, which forces a to be close to 0.5.
Improved cropping
Instead of randomly selecting the crops, selecting them based on out-of-fold confidence was also used. The confidence at time t is the probability of the ground truth class predicted on the 5 second crop starting from t.
Competition link
https://www.kaggle.com/c/birdsong-recognition
Evaluation
row-wise micro averaged F1 score
top10 solutions
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
Other (if any)
submission format