gretelai / gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://gretel.ai/platform/synthetics
Other
579 stars 87 forks source link

[BUG] : Loading a trained model and generating synthetic data throws an error #144

Closed AravAct closed 1 year ago

AravAct commented 1 year ago

Are you reporting a bug or FR?

What version of synthetics are you using? 0.20.0

What problem are you having? I am training a Timeseries DGAN model using the timeseries code provided by Gretel-synthetics library. I have trained the model and saved it using model. Save(path) provided in the dgan.py file. I am trying to load the file using load function in the same dgan.py. Aftaer loading trying to generate synthetic data using generate_numpy() It fails with Attribute error 'DGAN' object has no attribute 'attribute_noise_func'

Configuration Params

max_sequence_len=2000, sample_len=1, batch_size=128, epochs=1000, # For real data sets, 100–1000 epochs is typical feature_num_layers=5, attribute_discriminator_learning_rate = 0.001, feature_num_units = 100 , generator_learning_rate = 0.001 , discriminator_learning_rate= 0.001

Are you using GPU or a CPU? GPU

What environment are you working in?

Tried in both Jupyter & Google Colab, got the same error

What version of python are you using?

Tried in bth 3.8 (Colab)& 3.9(Jupyter)

Describe the shape / types of the data you are training on (42000,31) Please provide any tracebacks or error messages you are receiving

  471                 internal_data_list.append(
    472                     self._generate(self.attribute_noise_func(self.config.batch_size),
--> 473                                    self.feature_noise_func(self.config.batch_size),))
    474 
    475             # Convert from list of tuples to tuple of lists with zip(*) and

AttributeError: 'DGAN' object has no attribute 'attribute_noise_func'
AravAct commented 1 year ago

Reopened as error persists. Loading a model and generating new data is not working.

santhosh97 commented 1 year ago

Hey @AravAct ! Hope you're doing well! Do you mind sharing a snippet of what your input data looks like?

AravAct commented 1 year ago

Hi @santhosh97 , Thanks for the reply. Although I cannot share the data directly, I can share the shape of the dataset. We have transformed the data to required format( wide format) for the model. Shape of the data is (21, 2000, 6) 21 unique timeseries sequences, of length 2000 each and 6 features. We do not have any categorical variables, all are numerical.

kboyd commented 1 year ago

We're not able to reproduce that error after loading a model yet to investigate further. @AravAct can you share a minimum chunk of code that produces the error? So the model create, train, save, load, and generate code that leads to the error. I don't expect the exact data is the issue here based on the error, so I'm hoping random data (e.g., np.random.random(21, 6000, 6)) will still reproduce the problem so you can share some code that doesn't rely on your data directly.

kboyd commented 1 year ago

The only way I've found to generate the error is to run generate_dataframe (or generate_numpy) immediately after making the DGAN instance, before training:

from gretel_synthetics.timeseries_dgan.config import DGANConfig
from gretel_synthetics.timeseries_dgan.dgan import DGAN
dg = DGAN(DGANConfig(max_sequence_len=2000, sample_len=1))
dg.generate_numpy(n=3)

Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kendrick/.pyenv/versions/3.8.13/lib/python3.8/site-packages/gretel_synthetics/timeseries_dgan/dgan.py", line 436, in generate_numpy
    self.attribute_noise_func(self.config.batch_size),
AttributeError: 'DGAN' object has no attribute 'attribute_noise_func'

We can definitely provide a clearer error message in this situation. But this seems unrelated to saving and loading a model, so probably not helpful for this bug.

AravAct commented 1 year ago

I tried the suggested, adding DGAN instance before loading the model. Below is a reproducible code. Kindly let me know if I am making a mistake or this is something else.

run_ = 'Doppelganger_exp_savedmodel'
file_name_model = str(run_)+'.pt'

timelength = 2000
sample_len_=1
batch_size_=500
epochs_=30
num_layers_=2
num_units_ = 30
adlr_= 0.8
glr_=0.8
dlr_=0.8

features = np.random.rand(21, 2000, 6)
attributes = np.random.rand(21,1)
# Train the model
model = DGAN(DGANConfig(
    max_sequence_len=timelength,
    sample_len=sample_len_,
    batch_size=batch_size_,
    epochs=epochs_, # For real data sets, 100–1000 epochs is typical
     feature_num_layers=num_layers_,
     attribute_discriminator_learning_rate = adlr_,
    feature_num_units = num_units_ ,
    generator_learning_rate = glr_ ,
        discriminator_learning_rate= dlr_,
))

model.train_numpy(
      # run = run_,
      attributes=attributes,
      attribute_types = [OutputType.DISCRETE] * 1,
      features=features,
      feature_types = [OutputType.CONTINUOUS] * (6)
      )
print("model training complete")
model.save(file_name_model)

loaded_model = DGAN(DGANConfig(
    max_sequence_len=timelength,
    sample_len=sample_len_,
    batch_size=batch_size_,
    epochs=epochs_, # For real data sets, 100–1000 epochs is typical
     feature_num_layers=num_layers_,
     attribute_discriminator_learning_rate = adlr_,
    feature_num_units = num_units_ ,
    generator_learning_rate = glr_ ,
        discriminator_learning_rate= dlr_,
))

loaded_model.load(file_name=file_name_model)

synthetic_df = loaded_model.generate_numpy(1)

attributes_gen =synthetic_df[0]
features_gen = synthetic_df[1]

Got the below error

AttributeError                            Traceback (most recent call last)
[<ipython-input-20-02dfe40d3ef9>](https://localhost:8080/#) in <module>
     13 loaded_model.load(file_name=file_name_model)
     14 
---> 15 synthetic_df = loaded_model.generate_numpy(1)
     16 
     17 attributes_gen =synthetic_df[0]

[/usr/local/lib/python3.8/dist-packages/gretel_synthetics/timeseries_dgan/dgan.py](https://localhost:8080/#) in generate_numpy(self, n, attribute_noise, feature_noise)
    434                 internal_data_list.append(
    435                     self._generate(
--> 436                         self.attribute_noise_func(self.config.batch_size),
    437                         self.feature_noise_func(self.config.batch_size),
    438                     )

AttributeError: 'DGAN' object has no attribute 'attribute_noise_func'
kboyd commented 1 year ago

Thanks for sharing the code, now I do see what's going on. I see the same error on my setup with your code and the following version (with some smaller params so it runs quickly on my laptop) doesn't crash:

run_ = 'Doppelganger_exp_savedmodel'
file_name_model = str(run_)+'.pt'

timelength = 20
sample_len_=1
batch_size_=500
epochs_=1
num_layers_=2
num_units_ = 30
adlr_= 0.8
glr_=0.8
dlr_=0.8

features = np.random.rand(21, 20, 6)
attributes = np.random.rand(21,1)
# Train the model
model = DGAN(DGANConfig(
    max_sequence_len=timelength,
    sample_len=sample_len_,
    batch_size=batch_size_,
    epochs=epochs_, # For real data sets, 100–1000 epochs is typical
    feature_num_layers=num_layers_,
    attribute_discriminator_learning_rate = adlr_,
    feature_num_units = num_units_ ,
    generator_learning_rate = glr_ ,
    discriminator_learning_rate= dlr_,
))

model.train_numpy(
      # run = run_,
      attributes=attributes,
      attribute_types = [OutputType.DISCRETE] * 1,
      features=features,
      feature_types = [OutputType.CONTINUOUS] * (6)
      )
print("model training complete")
model.save(file_name_model)

loaded_model = DGAN.load(file_name=file_name_model)

synthetic_df = loaded_model.generate_numpy(1)

attributes_gen =synthetic_df[0]
features_gen = synthetic_df[1]

What's happening? The key part is that load is a class function that directly returns the DGAN instance. The DGANConfig params are also stored in the saved file, so you don't need to setup the DGAN instance for loading yourself. Just use

loaded_model = DGAN.load(file_name=file_name_model)

Unfortunately, python let's us call class functions on either the class or instances of the class, making it easy to confuse this. In your code, the DGAN instance with the saved model is returned from the load call but isn't stored anywhere and is lost. The loaded_model instance isn't modified from it's original state by the class function, so when you called generate_numpy it was on a model that hadn't been trained yet, just like https://github.com/gretelai/gretel-synthetics/issues/144#issuecomment-1459010357. I'll look into getting a clearer error message in these situations.

AravAct commented 1 year ago

loading DGAN directly worked perfectly. I can load the saved model and generate data again. Thanks for the clarification. Closing this.