Currently, the docs for the detection metrics (for both single table and sequential) mention that the final score is 1 - avg(ROC AUC). The also provide an interpretation for the score, assuming it's 1 - avg(ROC AUC):
However, in the codebase, we see a slightly different formula used for computing the score. It's more accurate to say that:
score = 1 - [MAX(0.5, roc auc)*2 - 1]
Which means:
A final score of 1 means that the original ROC AUC was in the [0, 0.5] range -- i.e. the ML was unable to tell the real data apart from the synthetic, which is good for quality but potentially bad for privacy.
A final score of 0 means that the original ROC AUC was 1 -- i.e. the ML could fully detect the real data apart from the synthetic. this is bad for quality but potentially good for privacy.
Currently, the docs for the detection metrics (for both single table and sequential) mention that the final score is
1 - avg(ROC AUC)
. The also provide an interpretation for the score, assuming it's1 - avg(ROC AUC)
:However, in the codebase, we see a slightly different formula used for computing the score. It's more accurate to say that:
Which means: