hitsz-ids / synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.
Apache License 2.0
3.27k stars 545 forks source link

Update NonValueTransformer's Default Setting and Handle Custom Fill Values #199

Closed MooooCat closed 3 months ago

MooooCat commented 3 months ago

Description

The changes involve updating the NonValueTransformer class within the sdgx/data_processors/transformers/nan.py file. Specifically, the drop_na attribute is updated to default to False, indicating that rows with missing values will not be dropped by default. Additionally, a new functionality is introduced to handle a custom fill_na_value passed through kwargs during the fit method. This value must be of type str, and if not, a ValueError is raised.

Motivation and Context

This change is required to enhance the flexibility of the NonValueTransformer class. By allowing users to specify a custom fill value for missing data, the transformer becomes more versatile and useful in scenarios where specific string values are preferred for filling missing data rather than dropping rows.

How has this been tested?

The changes have been tested by running unit tests that cover the fit and convert methods of the NonValueTransformer class.

Types of changes

Checklist: