Data Leakage in training

sintel-dev / Orion

A machine learning library for detecting anomalies in signals.

MIT License

993 stars 159 forks source link

Hey,

I'm trying to understand the TadGAN training procedure better. If I understood correctly, you don't train the model using only normal/good data. Based on the examples, I see the model also sees some anomalous segments during training but without any label. This is a big problem in autoencoder-based anomaly detection systems to have some kind of leakage of abnormal data in the training dataset. This makes the autoencoder to be able to reconstruct even the abnormal data and this affects the performance.

Can we say that TadGAN is kind of robust against leaking abnormal data into the training dataset? And whayt do you think is the reason?

Thank you

sintel-dev / Orion

Data Leakage in training #540