sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.39k stars 317 forks source link

Metadata `anonymize` doesn't produce the right `METADATA_SPEC_VERSION` #2304

Open npatki opened 6 days ago

npatki commented 6 days ago

Environment Details

Background

Since SDV 1.17.0, we have consolidated the old SingleTableMetadata and MultiTableMetadata objects into a single, streamlined Metadata object. Along with this change, we have also updated the value that the "METADATA_SPEC_VERSION" parameter should have ("V1"). Eg.

{
    "tables": {
        "table": {
            "columns": {
                "age": { "sdtype": "numerical" },
                "gender": { "sdtype": "categorical" },
                ...
            }
        }
    },
    "METADATA_SPEC_VERSION": "V1"
}

Error Description

When I use the anonymize method on the metadata object, it returns a new metadata object that has not set the correct METADATA_SPEC_VERSION.

Eg. If I run metadata.anonymize() on the metadata above, I get the following:

{
    "tables": {
        "table": {
            "columns": {
                "col1": { "sdtype": "numerical" },
                "col2": { "sdtype": "categorical" },
                ...
            }
        }
    },
    "METADATA_SPEC_VERSION": "MULTI_TABLE_V1"
}

I expect it to be "V1" (not "MULTI_TABLE_V1")