This pull request introduces updates to the fit methods across several data processors within the SDG framework. Specifically, the changes involve:
EmptyTransformer: Changed the empty_columns attribute from a list to a set for improved performance and uniqueness. Updated the fit method to populate this set based on the metadata's identification of empty columns.
NaNTransformer: Enhanced the fit method to accurately record numeric columns (integer and float) by iterating through the metadata and adding columns to their respective sets only if they match the expected data type.
NumericValueTransformer: Updated the fit method to correctly identify and record integer and float columns by checking each column's data type against the metadata.
OutlierTransformer: Similar to the NaNTransformer, the fit method was updated to accurately record integer and float columns by verifying their data types against the metadata.
Motivation and Context
This change is required to improve the accuracy and efficiency of the data processors' fit methods.
By updating these methods, we ensure that the data processors are more reliable and performant, leading to better synthetic data generation.
How has this been tested?
The changes have been tested through unit tests that verify the correctness of the updated fit methods.
Types of changes
[ ] Maintenance (no change in code, maintain the project's CI, docs, etc.)
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
Description
This pull request introduces updates to the
fit
methods across several data processors within the SDG framework. Specifically, the changes involve:EmptyTransformer: Changed the
empty_columns
attribute from a list to a set for improved performance and uniqueness. Updated thefit
method to populate this set based on the metadata's identification of empty columns.NaNTransformer: Enhanced the
fit
method to accurately record numeric columns (integer and float) by iterating through the metadata and adding columns to their respective sets only if they match the expected data type.NumericValueTransformer: Updated the
fit
method to correctly identify and record integer and float columns by checking each column's data type against the metadata.OutlierTransformer: Similar to the NaNTransformer, the
fit
method was updated to accurately record integer and float columns by verifying their data types against the metadata.Motivation and Context
This change is required to improve the accuracy and efficiency of the data processors'
fit
methods.By updating these methods, we ensure that the data processors are more reliable and performant, leading to better synthetic data generation.
How has this been tested?
The changes have been tested through unit tests that verify the correctness of the updated
fit
methods.Types of changes
Checklist: