To improve code reuse and maintainability in the SDV library, the _fit method of the GaussianCopulaSynthesizer class should be modularized by splitting it into multiple, well-defined functions. This will make the code easier to extend.
We need to break down the _fit method into smaller steps. These steps should be implemented as separate functions to handle specific tasks within the fitting process.
Expected Steps
Log Numerical Distributions: Keep the existing call to log_numerical_distributions_error as a standalone function. This step will remain unchanged.
Learn Number of Rows: Move the logic for determining the number of rows (self._num_rows = len(processed_data)) into a new method, e.g., self._learn_num_rows.
Extract Numerical Distributions for Modeling: Create a new method to extract numerical distributions for modeling. The logic inside the for loop that assigns distributions to each column should be refactored into a method, e.g., self._get_numerical_distributions.
Initialize the Model: Move the logic for initializing the model (self._model = GaussianMultivariate(...)) to its own method, e.g., self._initialize_model.
Fit the Model: Finally, create a new method to encapsulate the logic for fitting the model with scipy warnings handling, e.g., self._fit_model.
Description
To improve code reuse and maintainability in the SDV library, the
_fit
method of theGaussianCopulaSynthesizer
class should be modularized by splitting it into multiple, well-defined functions. This will make the code easier to extend.We need to break down the
_fit
method into smaller steps. These steps should be implemented as separate functions to handle specific tasks within the fitting process.Expected Steps
Log Numerical Distributions: Keep the existing call to
log_numerical_distributions_error
as a standalone function. This step will remain unchanged.Learn Number of Rows: Move the logic for determining the number of rows (
self._num_rows = len(processed_data)
) into a new method, e.g.,self._learn_num_rows
.Extract Numerical Distributions for Modeling: Create a new method to extract numerical distributions for modeling. The logic inside the
for
loop that assigns distributions to each column should be refactored into a method, e.g.,self._get_numerical_distributions
.Initialize the Model: Move the logic for initializing the model (
self._model = GaussianMultivariate(...)
) to its own method, e.g.,self._initialize_model
.Fit the Model: Finally, create a new method to encapsulate the logic for fitting the model with scipy warnings handling, e.g.,
self._fit_model
.