gretelai / gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://gretel.ai/platform/synthetics
Other
579 stars 87 forks source link

[HELP NEEDED] Credit card example #171

Open danielstankw opened 1 month ago

danielstankw commented 1 month ago

Hi, I am running this example: https://github.com/gretelai/gretel-synthetics/blob/master/examples/research/synthetics_knn_generate.ipynb

Because utilizing DataFramebatch doesnt work for me:

batcher = DataFrameBatch(df=training_set, batch_size=32, config=config_template)
batcher.create_training_data()
batcher.train_all_batches()

I use

config = TensorFlowConfig(
    batch_size=512,
    input_data_path=saved_set_path,
    gen_lines=1000,
    dropout_rate=0.4,
    rnn_units=1024,
    learning_rate = 0.001,
    max_lines=1e5,
    checkpoint_dir=str(Path.cwd() / "checkpoints_synthetic"),
    field_delimiter=",",
    overwrite=True
)
tokenizer = CharTokenizerTrainer(config=config)
train(config, tokenizer)

Whenever I try to generate the text I get the following

GenText(valid=False, text='627417,1.0,1', explain='record is 3 columns, not 31 as expected', delimiter=',')

The model is able to generate just 3 columns instead of 31. Why is that? How can I fix that?

My training results of the model are quite good, so it should be able to perform accordingly

loss: 0.2665 - accuracy: 0.9200 - val_loss: 0.2439 - val_accuracy: 0.9270
danielstankw commented 1 month ago

@zredlined

Marjan-emd commented 1 month ago

This notebook is working with one of our old models (LSTM) and requires tensorflow==2.9.0. We also recommend working with one of the Gretel's most recent models, Navigator Fine Tuning, from the console/SDK.