Enhance NumericInspector and Implement PositiveNegativeFilter
Description
This PR introduces significant enhancements to the Synthetic Data Generator (SDG) framework, specifically in the NumericInspector class and the addition of a new PositiveNegativeFilter class. The NumericInspector has been updated to support the identification of both positive and negative numeric columns, improving the quality of synthetic data generation. The PositiveNegativeFilter class is designed to filter data based on the positivity or negativity of values in specified columns, ensuring that the integrity of the data is maintained during processing.
Key changes include:
Updated NumericInspector to classify columns as positive or negative based on defined thresholds.
Introduced PositiveNegativeFilter to enforce positivity or negativity constraints on specified columns during data processing.
Added comprehensive test cases to validate the functionality of the new filter and the updated inspector.
Motivation and Context
The motivation behind these changes is to enhance the data quality assurance mechanisms within the SDG framework. By allowing the identification of positive and negative columns, we can ensure that the synthetic data generated meets specific criteria, which is crucial for various applications such as model training and data sharing. This change addresses the need for more robust data validation and filtering capabilities, ultimately leading to better performance and reliability of the generated synthetic data.
How has this been tested?
The changes have been thoroughly tested using a dedicated test suite. The following tests were performed:
Unit tests for the updated NumericInspector to ensure correct identification of positive and negative columns.
Integration tests for the PositiveNegativeFilter to verify that it correctly filters data based on the positivity and negativity of values in specified columns.
Tests included checks for the integrity of mixed columns, ensuring they remain unchanged during filtering processes.
All tests were executed in a controlled environment using pytest, and all assertions passed successfully.
Types of changes
[ ] Maintenance (no change in code, maintain the project's CI, docs, etc.)
[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
Enhance NumericInspector and Implement PositiveNegativeFilter
Description
This PR introduces significant enhancements to the Synthetic Data Generator (SDG) framework, specifically in the
NumericInspector
class and the addition of a newPositiveNegativeFilter
class. TheNumericInspector
has been updated to support the identification of both positive and negative numeric columns, improving the quality of synthetic data generation. ThePositiveNegativeFilter
class is designed to filter data based on the positivity or negativity of values in specified columns, ensuring that the integrity of the data is maintained during processing.Key changes include:
NumericInspector
to classify columns as positive or negative based on defined thresholds.PositiveNegativeFilter
to enforce positivity or negativity constraints on specified columns during data processing.Motivation and Context
The motivation behind these changes is to enhance the data quality assurance mechanisms within the SDG framework. By allowing the identification of positive and negative columns, we can ensure that the synthetic data generated meets specific criteria, which is crucial for various applications such as model training and data sharing. This change addresses the need for more robust data validation and filtering capabilities, ultimately leading to better performance and reliability of the generated synthetic data.
How has this been tested?
The changes have been thoroughly tested using a dedicated test suite. The following tests were performed:
NumericInspector
to ensure correct identification of positive and negative columns.PositiveNegativeFilter
to verify that it correctly filters data based on the positivity and negativity of values in specified columns.Types of changes
Checklist: