Metrics and losses computation

A and B represent the two sets, so in this case, the ground-truth and prediction segmentation maps.

True positive, False Negative, False positive, and True negative come from calculating a confusion matrix. They represent what the model predicts for each class category (the pixels, as you said), and what the actual ground-truth is.

The F1 (or Dice) score is calculated in the same way as you have in your comment, but with smooth to deal with dividing by zeros and beta which "is chosen such that recall is considered beta times as important as precision" - Wiki

# calculate score
    tp = backend.sum(gt * pr, axis=axes)
    fp = backend.sum(pr, axis=axes) - tp
    fn = backend.sum(gt, axis=axes) - tp

    score = ((1 + beta ** 2) * tp + smooth) \
            / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth)
    score = average(score, per_image, class_weights, **kwargs)

    return score

qubvel / segmentation_models

Metrics and losses computation #339