Implement MSAS - Githubissues

LiFaytheGoblin commented 2 years ago

Problem Description

The current Metrics implemented in SDV do not specifically measure the quality of sequences generated with CPAR.

Expected behavior

MSAS is a metric for sequential data quality, detailed in http://arxiv.org/abs/2207.14406. It should be implemented in SDV.

npatki commented 2 years ago

Thanks for filing @LiFaytheGoblin. We'll keep this open to track as we make progress on it.

Just a note that MSAS refers to our overall algorithm of computing sequential data quality, and works in the following steps:

Compute a metric for every sequence in the real data to get a distribution X
Compute the same metric for every sequence in the synthetic data to get a distribution X'
Use the KSComplement test to compare the distributions X and X'

Various metrics can be used in step 1. In the paper we used: length, mean, median, standard deviation and the difference between a row n and some step n+t.

Are there any particular metrics that are more or less important to your use case?

npatki commented 2 years ago

FYI some metrics that will use MSAS are actively being discussed in #198

sdv-dev / SDMetrics

Implement MSAS #199

Problem Description

Expected behavior