hitsz-ids / synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.
Apache License 2.0
3.27k stars 545 forks source link

Fix Division by Zero Error in Numeric Column Inspection #220

Closed MooooCat closed 1 month ago

MooooCat commented 1 month ago

Description

This pull request introduces safeguards against division by zero errors in the _is_int_column and _is_positive_or_negative_column methods within the sdgx/data_models/inspectors/numeric.py file. The changes ensure that before any operations that could potentially lead to division by zero, we first check if the numeric values extracted from the column series are empty. If there are no numeric values, the methods will return False, thereby preventing any division operations that could result in errors.

Motivation and Context

Division by zero is a common issue that can lead to runtime errors and unexpected behavior in data processing applications. This change is required to enhance the robustness of the data inspection methods by ensuring that they handle cases where numeric data may be absent. By implementing these checks, we can avoid potential crashes and improve the reliability of the Synthetic Data Generator framework.

This PR addresses the need for better error handling in the numeric column inspection methods, which is crucial for maintaining data integrity and application stability.

How has this been tested?

The changes have been tested by running the modified methods with various input scenarios, including:

The testing environment included the latest version of pandas, and unit tests were updated to cover these new scenarios, confirming that the changes do not introduce any regressions.

Types of changes

Checklist: