Lossvalues are good, but the quality of the synthetic data is bad... How?? HELP WANTED

Hi, I would like to ask a question.

I am using the CTGAN Model for my masterthesis, i want to generate synthetic data using the dataset CIC Collection (https://www.kaggle.com/datasets/dhoogla/cicidscollection) (intrusion detection system dataset, so it contains attacks, there are only numerical features!!). I want to generate synthetic data of a certain attack, doesn't matter which one, I choose to generate fake samples for the attack 'Infiltration', which counts 94857 real samples to train with. I have trained my CTGAN model with the following code:

from ctgan import CTGAN

ctgan = CTGAN(epochs=600, verbose=True, generator_lr=1e-5, discriminator_lr=1e-6, batch_size=128, pac=2, generator_decay=1e-6,
                 discriminator_decay=1e-6, discriminator_steps=1)
ctgan.fit(real_data, discrete_columns)

loss values:

metrics from SDV:

KS complement Average: 0.3587

Result: Despite that the generator and discriminator are stabilizing, the quality of my fake samples is not that good, bad actually.

Then I trained CTGAN synthesizer, this one gonna put some more preprocessing init, but the results are not different.

Why is this happening? My loss values are perfectly shaped according to https://github.com/sdv-dev/SDV/discussions/980

If you need other information, please ask me! Can you help me guys? I have been struggling with this for a while.....

You can see some of the distributions (see images)

distribution_Total Fwd Packets distribution_Total Backward Packets distribution_Fwd Packets Length Total distribution_Fwd Packet Length Std distribution_Flow Duration distribution_Bwd Packets Length Total distribution_Bwd Packet Length Std

sdv-dev / CTGAN

Lossvalues are good, but the quality of the synthetic data is bad... How?? HELP WANTED #391