sdv-dev / CTGAN

Conditional GAN for generating synthetic tabular data.
1.23k stars 279 forks source link

Lossvalues are good, but the quality of the synthetic data is bad... How?? HELP WANTED #391

Closed ilkayyuksel closed 2 months ago

ilkayyuksel commented 2 months ago

Hi, I would like to ask a question.

I am using the CTGAN Model for my masterthesis, i want to generate synthetic data using the dataset CIC Collection ( (intrusion detection system dataset, so it contains attacks, there are only numerical features!!). I want to generate synthetic data of a certain attack, doesn't matter which one, I choose to generate fake samples for the attack 'Infiltration', which counts 94857 real samples to train with. I have trained my CTGAN model with the following code:

from ctgan import CTGAN

ctgan = CTGAN(epochs=600, verbose=True, generator_lr=1e-5, discriminator_lr=1e-6, batch_size=128, pac=2, generator_decay=1e-6,
                 discriminator_decay=1e-6, discriminator_steps=1), discrete_columns)

loss values: image

metrics from SDV:

KS complement Average: 0.3587

Result: Despite that the generator and discriminator are stabilizing, the quality of my fake samples is not that good, bad actually.

Then I trained CTGAN synthesizer, this one gonna put some more preprocessing init, but the results are not different.

Why is this happening? My loss values are perfectly shaped according to

If you need other information, please ask me! Can you help me guys? I have been struggling with this for a while.....

You can see some of the distributions (see images)

distribution_Total Fwd Packets distribution_Total Backward Packets distribution_Fwd Packets Length Total distribution_Fwd Packet Length Std distribution_Fwd Packet Length Mean distribution_Fwd Packet Length Max distribution_Flow Duration distribution_Bwd Packets Length Total distribution_Bwd Packet Length Std distribution_Bwd Packet Length Mean distribution_Bwd Packet Length Max

srinify commented 2 months ago

Hi @ilkayyuksel I'll close this issue out since we already have this thread: