sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.39k stars 317 forks source link

`NAN` values in columns `Real Correlation` and `Synthetic Correlation` #2300

Closed celsofranssa closed 4 days ago

celsofranssa commented 4 days ago

Environment details

Problem description

After running the evaluate_quality and getting details about the Column Pair Trends as follows:

quality_report = evaluate_quality(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata=metadata)

quality_report .get_details(property_name='Column Pair Trends')

I have no direction to interpret the NAN values in columns Real Correlation and Synthetic Correlation

image

celsofranssa commented 4 days ago

I've got it. This behavior happened because the metadata assumed sdtype to be categorical for all columns instead of numeric.

npatki commented 4 days ago

Hi @celsofranssa, glad you were able to figure it out.

Just for more context:

Hope that makes sense and happy to answer any other Qs that you have when using SDMetrics.