hq-deng / RD4AD

Anomaly Detection via Reverse Distillation from One-Class Embedding
MIT License
186 stars 31 forks source link

about OCE #5

Closed tommying closed 2 years ago

tommying commented 2 years ago

Hi, sir. Thanks for your great job, I have some confuse about OCBE :

MFF aligns multi-scale features from teacher E and OCE condenses the obtained rich feature to a compact bottleneck code. But, the MMF part can did all things above. Why still use OCE module? Table 5 shows ablation study on Pre, Pre+OCE, Pre+OCE+MFF.
did you do the ablation study on pre+MFF?

OCBE module condenses the multi-scale patterns into an extreme low-dimensional space for downstream normal representation reconstruction. Then, the abnormal representations generated by the teacher model are likely to be abandoned by OCBE. why????

I am looking forward to your reply! Thanks.

hq-deng commented 2 years ago

Hello,

Pre+OCE means that we re-train the bottleneck block, while only pre is that we only use the frozen bottleneck block. We train the bottleneck block on one-class samples so that it refers to one-class embedding (OCE). The pre+MFF is meaningless, because the pre-trained bottleneck block (like 4th layer in ResNet) received features from the last layer rather than multi-scale feature when it was trained on ImageNet. If we want to fuse multi-scale features, we should train the bottleneck block to adapt to multi-scale features.

This is the same as why auto-encoder can achieve anomaly detection. During training, we compress the normal images to a low-dimensional codes, and then restore the normal images. Although code contains less information, the normal images are generated still well as it is trained. The anomalous case will generate a larger error as some anomalous informations are discarded. A toy case is that if normal case is [0,0,0,0,0,0,0] and anomalous case is [0,0,1,1,1,0,0]. If we compress it to a code with length of 5, the anomalous code maybe [0,1,1,1,0]. If we compress it to a code with length of 3, the anomalous code maybe [0,1,0]. The normal case is still restored as [0,0,0,0,0,0,0], but the anomalous case will be restored as [0,0,1,1,1,0,0] and [0,0,0,1,0,0,0] respectively. This case is very extreme and just for understanding it. Actually, previous studies have shown that the more compact latent code lead to larger anomaly error. When it comes back to OCBE, the default compression is like [(16,16,1024)] -> [(8,8,2048)] and MFF should be [(64,64,256),(32,32,512),(16,16,1024)] -> [(8,8,2048)]. Although we give the same target space, but there is much more information input. It relatively compresses information.

tommying commented 2 years ago

Hello,

Pre+OCE means that we re-train the bottleneck block, while only pre is that we only use the frozen bottleneck block. We train the bottleneck block on one-class samples so that it refers to one-class embedding (OCE). The pre+MFF is meaningless, because the pre-trained bottleneck block (like 4th layer in ResNet) received features from the last layer rather than multi-scale feature when it was trained on ImageNet. If we want to fuse multi-scale features, we should train the bottleneck block to adapt to multi-scale features.

This is the same as why auto-encoder can achieve anomaly detection. During training, we compress the normal images to a low-dimensional codes, and then restore the normal images. Although code contains less information, the normal images are generated still well as it is trained. The anomalous case will generate a larger error as some anomalous informations are discarded. A toy case is that if normal case is [0,0,0,0,0,0,0] and anomalous case is [0,0,1,1,1,0,0]. If we compress it to a code with length of 5, the anomalous code maybe [0,1,1,1,0]. If we compress it to a code with length of 3, the anomalous code maybe [0,1,0]. The normal case is still restored as [0,0,0,0,0,0,0], but the anomalous case will be restored as [0,0,1,1,1,0,0] and [0,0,0,1,0,0,0] respectively. This case is very extreme and just for understanding it. Actually, previous studies have shown that the more compact latent code lead to larger anomaly error. When it comes back to OCBE, the default compression is like [(16,16,1024)] -> [(8,8,2048)] and MFF should be [(64,64,256),(32,32,512),(16,16,1024)] -> [(8,8,2048)]. Although we give the same target space, but there is much more information input. It relatively compresses information.

Thank you very much for the reply and I'm sorry to bother you again.

  1. I still can't understand why pre+MFF is meaningless.
    You said that the pre-trained bottleneck block (like 4th layer in ResNet) received features from the last layer rather than multi-scale feature when it was trained on ImageNet. I actually didn't really understand that.

    pre+MFF get the multi-scale feature from the pre-trained encoder and use MFF align them. Then put it to decoder. There only use the output of diferent bottleneck block that were pre-trained on ImageNet and no OCE module to use.

  2. OCE was adopt the 4th residule block of ResNet. You said pre+OCE means that we re-train the bottleneck block. So, OCE is part of teacher network (4th residule block of teacher network) or is OCE just a new residule block like 4th residule block of Teacher?

hq-deng commented 2 years ago

As we use the 1st, 2nd, 3rd layers of a resnet as the teacher encoder, so the dimension of the features from the 3rd layer is fitting for the 4th layer of the resnet, so it's naturally we use the 4th layer of the encoder as the bottleneck layer. Pre refers to the pertained but frozen 4th layer which is the same as 1st, 2nd, 3rd in teacher encoder. The whole encoder is a pre-trained model on ImageNet. As the 4th layer receive the feature from the 3rd layer and trained on ImageNet, not from MFF when pre-training, we should re-train it on MVTec when we want to add MFF features. OCE is part pf resnet used in teacher network. But sometimes we should train the OCE (4th layer) but freeze the teacher encoder (1st, 2nd, 3rd layers). Or modifying it as MFF+OCE.

tommying commented 2 years ago

As we use the 1st, 2nd, 3rd layers of a resnet as the teacher encoder, so the dimension of the features from the 3rd layer is fitting for the 4th layer of the resnet, so it's naturally we use the 4th layer of the encoder as the bottleneck layer. Pre refers to the pertained but frozen 4th layer which is the same as 1st, 2nd, 3rd in teacher encoder. The whole encoder is a pre-trained model on ImageNet. As the 4th layer receive the feature from the 3rd layer and trained on ImageNet, not from MFF when pre-training, we should re-train it on MVTec when we want to add MFF features. OCE is part pf resnet used in teacher network. But sometimes we should train the OCE (4th layer) but freeze the teacher encoder (1st, 2nd, 3rd layers). Or modifying it as MFF+OCE.

Thanks a lot.