Open kargarisaac opened 1 month ago
Hi @kargarisaac - thanks for raising an interesting question.
The assumption that the data is free from anomalies is a strong assumption to make since (in most cases), users do not have labeled data apriori. TadGAN is completely unsupervised as it does not require to make this distinction for training. Specifically, the model should not be able to reconstruct anomalies as well as normal sequences, because (1) we assume that anomalies will lose their information during encoding; (2) anomalies are infrequence scarce compared to the normal instances.
Hey,
I'm trying to understand the TadGAN training procedure better. If I understood correctly, you don't train the model using only normal/good data. Based on the examples, I see the model also sees some anomalous segments during training but without any label. This is a big problem in autoencoder-based anomaly detection systems to have some kind of leakage of abnormal data in the training dataset. This makes the autoencoder to be able to reconstruct even the abnormal data and this affects the performance.
Can we say that TadGAN is kind of robust against leaking abnormal data into the training dataset? And whayt do you think is the reason?
Thank you