sjtuplayer / anomalydiffusion

[AAAI 2024] AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model
MIT License
138 stars 19 forks source link

The defination of your "few-shot" #10

Closed TerryMelody closed 6 months ago

TerryMelody commented 6 months ago

Hello dear authors! Thanks for your wonderful work. I have confusion about "few-shot" defination in your paper name. I noticed that you use 1/3 test data to train the text-inversion step. And in the Unet training, you also access the normal data in the dataset. It seems that you use a lot of data in the dataset. My understanding of "few-shot" stayed in the phase of "WinClip" or "VAND", they just access few shots of normal data as reference data. So I wonder whether your defination of "few-shot" is different from them or there is no clear defination in the anomaly detection region.

sjtuplayer commented 6 months ago

The few-shot definition in our paper is consistent with the few-shot definition in generative models, i.e., train the model with few-shot target-domain data. For few-shot generative models, there can be large number of source-domain data (normal samples in AD task) while there are very few target-domain data (anomalous samples in AD task). Therefore, our few-shot definition is not the same with that of "WinClip".

Note that in the Unet training, we do not use the normal data. And only in inference stage, we generate anomalies on the normal samples.

TerryMelody commented 6 months ago

Thanks your instant reply! So you trained Unet in the all generated data? I think Unet training should have both normal and abnormal data and I can understand the generated anomaly data part. But how about the normal data part?

sjtuplayer commented 6 months ago

Thanks your instant reply! So you trained Unet in the all generated data? I think Unet training should have both normal and abnormal data and I can understand the generated anomaly data part. But how about the normal data part?

Sorry for misunderstanding the Unet you mentioned, Previously, I meant the UNet in diffusion model. For the UNet anomaly detection model, it is trained on both the normal samples in the training set and the generated anomaly data.

TerryMelody commented 6 months ago

Okay I got it. Thanks for your patience! So you follow the few-shot definition in generative process(and then use it for discriminative task training) , right? I'm curious that few-shot definition in your paper and "WinClip" are both reasonable? Or I miss some prerequisites? They seem very different.

engrmusawarali commented 6 months ago

@TerryMelody Few-shot is for generative model, not the UNET(used for anomaly detection and localization) which is trained on generated data. They are using few samples to train diffusion model, such that it can generate extensive data samples.

Anomaly Diffusion does not follow the few-shot strategy for downstream anomaly inspection, However, WinCLIP follows few-shot approach in downstream inspection tasks.

There are two UNETs here, one for diffusion model and one for anomaly detection and localization. You should not confuse both UNETs. To be more clear diffusion model UNET is few-shot and anomaly detection model is not few-shot.

Furthermore, May be you are trying to use generated data with WinCLIP. WinCLIP does not need extensive data. The available data in MVTec is more than enough for WinCLIP and other methods similar to this approach. I hope this is clear to you.

TerryMelody commented 6 months ago

@TerryMelody Few-shot is for generative model, not the UNET(used for anomaly detection and localization) which is trained on generated data. They are using few samples to train diffusion model, such that it can generate extensive data samples.

Anomaly Diffusion does not follow the few-shot strategy for downstream anomaly inspection, However, WinCLIP follows few-shot approach in downstream inspection tasks.

There are two UNETs here, one for diffusion model and one for anomaly detection and localization. You should not confuse both UNETs. To be more clear diffusion model UNET is few-shot and anomaly detection model is not few-shot.

Furthermore, May be you are trying to use generated data with WinCLIP. WinCLIP does not need extensive data. The available data in MVTec is more than enough for WinCLIP and other methods similar to this approach. I hope this is clear to you.

Thank you so much! I'm clear about the setting. =)