sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
201 stars 45 forks source link

Improve readability of the report scores when verbosity is on #538

Closed npatki closed 5 months ago

npatki commented 6 months ago

Problem Description

When running the Quality or Diagnostic report in verbose mode (default), the progress and scores are shown. They look something like this:

Generating report ...
(1/3) Evaluating Data Validity: : 100%|██████████████████████| 52/52 [00:00<00:00, 4764.17it/s]
(2/3) Evaluating Data Structure: : 100%|█████████████████████| 3/3 [00:00<00:00, 1566.40it/s]
(3/3) Evaluating Relationship Validity: : 100%|██████████████████████| 2/2 [00:00<00:00, 770.16it/s]

Overall Score: 94.67%

Properties:
- Data Validity: 84.01%
- Data Structure: 100.0%
- Relationship Validity: 100.0%

This is a bit hard to parse:

  1. The 100% at the very top refers to the progress (completion) not the overall property score
  2. The property scores printed at the bottom actually refer to the progress bar at the top
  3. The overall score is printed in the middle of all this and can be easy to miss

Expected behavior

A more intuitive approach is to (a) stop printing out the progress %, (b) print out each property score after completing it, and (c) end with the overall score. So the above report would look something like this:

Generating report ...
(1/3) Evaluating Data Validity: |██████████████████████| 52/52 [00:00<00:00, 4764.17it/s]
Score: 84.01%

(2/3) Evaluating Data Structure: |█████████████████████| 3/3 [00:00<00:00, 1566.40it/s]
Score: 100.0%

(3/3) Evaluating Relationship Validity: |██████████████████████| 2/2 [00:00<00:00, 770.16it/s]
Score: 100.0%

Overall Score (Average): 94.67%

Additional context

This same format should be used for the quality report and diagnostic report, for both multi- and single-table modalities.