This pull request updates the sdgx_example_ctgan.ipynb notebook to use the latest version of the Synthetic Data Generator (SDG) from the GitHub repository. The changes include:
Updating the installation command to use the GitHub repository instead of the PyPI package.
Increasing the number of training epochs for the CTGAN model from 2 to 128.
Removing unnecessary data preprocessing steps (remove_empty_rows and clear_na) as they will be integrated into sdgx.processor in the future.
Updating the logging messages and outputs to reflect the changes in the SDG framework.
Motivation and Context
This change is required to ensure that the example notebook uses the latest features and improvements from the SDG framework, which are not yet available in the PyPI package.
By using the GitHub repository, we can leverage the most recent updates and bug fixes.
How has this been tested?
The changes have been tested by running the updated notebook in a local development environment. The notebook was executed step-by-step to ensure that the synthetic data generation process works as expected with the new settings and dependencies.
Types of changes
[x] Maintenance (no change in code, maintain the project's CI, docs, etc.)
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
Description
This pull request updates the
sdgx_example_ctgan.ipynb
notebook to use the latest version of the Synthetic Data Generator (SDG) from the GitHub repository. The changes include:remove_empty_rows
andclear_na
) as they will be integrated intosdgx.processor
in the future.Motivation and Context
This change is required to ensure that the example notebook uses the latest features and improvements from the SDG framework, which are not yet available in the PyPI package.
By using the GitHub repository, we can leverage the most recent updates and bug fixes.
How has this been tested?
The changes have been tested by running the updated notebook in a local development environment. The notebook was executed step-by-step to ensure that the synthetic data generation process works as expected with the new settings and dependencies.
Types of changes
Checklist: