gretelai / gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://gretel.ai/platform/synthetics
Other
579 stars 87 forks source link

[BUG]: Outdated category_encoders #149

Closed mahmadza closed 1 year ago

mahmadza commented 1 year ago

Are you reporting a bug or FR?

What version of synthetics are you using? 0.20.0

What would you like to see / What problem are you having? Problem with category_encoders version. Current version in gretel-synthetics is 2.2.2. Latest version is 2.6.0 (https://github.com/scikit-learn-contrib/category_encoders).

Configuration Params


from gretel_synthetics.timeseries_dgan.config import DGANConfig

config = DGANConfig(
    max_sequence_len=max_days,
    sample_len=1,
    generator_learning_rate=1e-4,
    discriminator_learning_rate=1e-4,
    epochs=epochs
)

model = DGAN(config)

model.train_dataframe(
    df = real_df_truc,
    example_id_column = id_col,
    feature_columns = feature_cols,
    attribute_columns = attribute_cols,
    time_column = time_col,
    df_style = 'long',
)

Are you using GPU or a CPU? GPU

What environment are you working in? Jupyter

What version of python are you using? 3.8.8

Describe the shape / types of the data you are training on 1 example ID column 3 feature columns, all numerical 10 attribute columns, mix of categorical and numerical 1 time column, in YYYY-MM-DD

Please provide any tracebacks or error messages you are receiving

2023-04-13 23:50:09,577 : MainThread : INFO : Marking column XXX as discrete because its type is string/object.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed eval> in <module>

~/.local/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/dgan.py in train_dataframe(self, df, attribute_columns, feature_columns, example_id_column, time_column, discrete_columns, df_style, progress_callback)
    393         attributes, features = self.data_frame_converter.convert(df)
    394 
--> 395         self.train_numpy(
    396             attributes=attributes,
    397             features=features,

~/.local/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/dgan.py in train_numpy(self, features, feature_types, attributes, attribute_types, progress_callback)
    238 
    239         if not self.is_built:
--> 240             attribute_outputs, feature_outputs = create_outputs_from_data(
    241                 attributes,
    242                 features,

~/.local/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/transformations.py in create_outputs_from_data(attributes, features, attribute_types, feature_types, normalization, apply_feature_scaling, apply_example_scaling, binary_encoder_cutoff)
    399             )
    400         attribute_types = cast(List[OutputType], attribute_types)
--> 401         attribute_outputs = [
    402             create_output(
    403                 index,

~/.local/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/transformations.py in <listcomp>(.0)
    400         attribute_types = cast(List[OutputType], attribute_types)
    401         attribute_outputs = [
--> 402             create_output(
    403                 index,
    404                 t,

~/.local/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/transformations.py in create_output(index, t, data, normalization, apply_feature_scaling, apply_example_scaling, binary_encoder_cutoff)
    486         raise RuntimeError(f"Unknown output type={t}")
    487 
--> 488     output.fit(data.flatten())
    489 
    490     return output

~/.local/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/transformations.py in fit(self, column)
     41             raise ValueError("Expected 1-d numpy array for fit()")
     42 
---> 43         self._fit(column)
     44         self.is_fit = True
     45 

~/.local/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/transformations.py in _fit(self, column)
    123         self._encoder = OneHotEncoder(cols=0, return_df=False)
    124 
--> 125         self._encoder.fit(column)
    126 
    127     def _transform(self, column: np.ndarray) -> np.ndarray:

~/.local/lib/python3.8/site-packages/category_encoders/one_hot.py in fit(self, X, y, **kwargs)
    149             handle_missing='value'
    150         )
--> 151         self.ordinal_encoder = self.ordinal_encoder.fit(X)
    152         self.mapping = self.generate_mapping()
    153 

~/.local/lib/python3.8/site-packages/category_encoders/ordinal.py in fit(self, X, y, **kwargs)
    131             self.cols = util.get_obj_cols(X)
    132         else:
--> 133             self.cols = util.convert_cols_to_list(self.cols)
    134 
    135         if self.handle_missing == 'error':

~/.local/lib/python3.8/site-packages/category_encoders/utils.py in convert_cols_to_list(cols)
     19     elif isinstance(cols, tuple):
     20         return list(cols)
---> 21     elif pd.api.types.is_categorical(cols):
     22         return cols.astype(object).tolist()
     23 

AttributeError: module 'pandas.api.types' has no attribute 'is_categorical'
johntmyers commented 1 year ago

@mahmadza What version of Pandas was installed when this error occurred? This looks like a Pandas error, so am wondering if updating category encoders also updated Pandas for you.

mahmadza commented 1 year ago

@johntmyers I no longer have the environment when this error occured.

Additionally, I also had a different error, but with NumPy. This error was solved when I upgraded category_encoders.

johntmyers commented 1 year ago

Interesting. We have not been able to reproduce these issues, category_encoders is a pinned version and has worked in all of our tests to include our automated tests and we have minimum versions of Pandas and Numpy as well. Glad it's working now but I am going to close this ticket since we cannot validate that the currently pinned version of category_encoders leads to these errors.