Autoencoder? - Githubissues

david-waterworth commented 12 months ago

Isn't an autoencoder supposed to compress the input into a lower dimension, then reconstruct (i.e. https://blog.keras.io/building-autoencoders-in-keras.html)? This model seems to project the input dimension from 52 to 100 (i.e. a higher dimension), then reconstructs which seems highly likely to overfit the training data.

In this case it appears to work, because I guess provided you have enough examples of faults and non-faults then over-fitting/memoizing the non-fault dataset is probably ok.

I do get similar results (but nowhere nears as good reconstruction) using a 16 dim hidden layer with l1 regularization. It doesn't reconstruct all the noise in the inputs, but it still produces reconstructions of the "faulty" datasets that contain higher reconstruction errors (generally offsets like your charts).

mohan696matlab commented 12 months ago

Hi David,

Thanks for the question. For the first query of why I am increasing the latent layer dimensions rather than reducing it. Here my goal is to reconstruct the input as accurately as possible for the fault free dataset. And reducing the number of hidden layer results in poorer reconstruction, I tried with 16, 32, 64, 100 neurons in hidden layer. I got the best MSE with 100 neurons.

You are right that usually the latent dimension is smaller compared to input dimensions, but this is particularly useful if you are doing dimensionality réduction. However, just for anomaly detection this method is not very useful as we don't care about how many neurons are there in hidden layer.

A even better approach would be to use regularization with huge number of hidden layers to avoid over-fitting. You can use L1 regularization, dropout, batch normalization and using a large batch of data for training.

david-waterworth commented 12 months ago

Thanks for the reply. My concern with using a higher dimensional hidden layer is, how you ensure that the model learns a representation that isn't the trivial identity mapping between the inputs and outputs. Counter-intuitively[*] when I trained a model with hidden size of 100 with regularization this appears to be what it did, when I use a hidden dim of ~16 (plus regularisation) it clearly has higher mse than with h=100 but it appears to work better at discrimating between fault and non-fault? Possibly because it's better at separating signal from noise perhaps?

So I suspect validation using mse is problematic and you really need to validate using the final/downstream objective (i.e. the accuracy or F1 score of the fault classification) rather than perfect reconstruction of the inputs?

[*] The more I think about it, the more this isn't counter-intuitive. If it's shrinking the weights, and identity is a minima of the unregularised objective, then it's probably also the minima of the objective + l1_norm(weights) ?

mohan696matlab commented 12 months ago

The points you presented are very interesting especially the identity being minima of both regularized and unregularized autoencoder. However it will be interesting to check out the weights individually and see if the weights associated with other variables are close to zero or not. And also, if increasing the number of hidden layers, does it reduces the weights associated with other variables.

And I agree with you on evaluating the model on downstream objective using F1 score, but for absolutely new system it can be difficult to get the fault mode data as we have to work with only normal/ healthy data.

mohan696matlab / TEP_Anomaly_detection_deployment

Autoencoder? #1