When dealing with tabular data at the scale of millions of rows and hundreds of columns, the current GaussianCopulaSynthesizer encounters significant memory usage problems (approximately 37 GB on MacOS M3 MAX).
Proposed Solution
A reduction in resource consumption (e.g., achieving around 4 GB of memory usage for the given data case), alongside the capability to train on larger datasets while maintaining good performance.
Problem
When dealing with tabular data at the scale of millions of rows and hundreds of columns, the current
GaussianCopulaSynthesizer
encounters significant memory usage problems (approximately 37 GB on MacOS M3 MAX).Proposed Solution
A reduction in resource consumption (e.g., achieving around 4 GB of memory usage for the given data case), alongside the capability to train on larger datasets while maintaining good performance.
Additional context
Reproduction Code & Files:
test file: test.csv