Closed MooooCat closed 3 months ago
We have observed that initializing different Metadata within the same function or the same batch of unit tests seems to interfere with each other, leading to inaccurate table metadata. This might be a bug, and we should create a separate Issue and PR to address it.
For example, we can look at the error in the test , in tests/data_models/test_metadata.py::test_demo_multi_table_data_metadata_parent
. This test is intended for a multi-table dataset, but the metadata includes columns from the single-table dataset adult.csv
, i.e. {'workclass', 'fnlwgt', 'age'}
. This issue could be caused by the metadata or the inspector.
Description
This pull request introduces several enhancements and fixes to the Synthetic Data Generator (SDG) framework, focusing on the handling of constant columns in tabular data. The changes include:
ConstInspector
class to identify columns with constant values in a DataFrame.ConstValueTransformer
class to transform and reverse transform data by replacing specified columns with constant values.Motivation and Context
This change is required to improve the quality and utility of the synthetic data generated by the SDG framework.
By identifying and handling constant columns, we ensure that the synthetic data maintains the integrity of the original data.
This enhancement also addresses the need for more robust data transformation capabilities, allowing for more accurate and controlled generation of synthetic data.
How has this been tested?
The changes have been thoroughly tested using unit tests that cover the new functionality introduced by
ConstInspector
andConstValueTransformer
.Types of changes
Checklist: