Question about training UNet with the provided generated images

DaiZhewei commented 2 months ago

Hi Author, We've trained 15 UNets using the provided generated data and conducted segmentation tests. We would like to ask you some questions:

The provided data only contains 500 image-mask pairs per defect class, which contradicts the 1000 pairs mentioned in the paper.
The resolution of the provided data is 256, while the stable-diffusion model directly generates images with a resolution of 512. How were the 256 resolution images obtained?
The segmentation results of the UNet we trained using the provided data on the test set are significantly different from those reported in the paper. Could you please provide insight into what might be causing this? 【Results in the paper】 Image Level AUROC: 99.2 AP: 99.7 F1: 98.7 Pixel Level AUROC: 99.1 AP: 81.4 F1: 76.3 【Results we implemented】 Image Level AUROC: 94.92 AP: 97.54 F1: 93.83 Pixel Level AUROC: 95.46 AP: 64.31 F1: 64.21 Looking forward to your reply.

engrmusawarali commented 2 months ago

Dear Dai Zhewei,

Stable diffusion model can utilize any resolution which is multiple of 64. So it can be higher or lower. I am not sure about UNet, but they trained classifier on the provided dataset.

DaiZhewei commented 2 months ago

Dear Dai Zhewei,

Stable diffusion model can utilize any resolution which is multiple of 64. So it can be higher or lower. I am not sure about UNet, but they trained classifier on the provided dataset.

OK, I undestand, thanks for your reply!

engrmusawarali commented 2 months ago

Dear @DaiZhewei did you get the segmentation results for all classes of MVTec and computed average?

DaiZhewei commented 2 months ago

Dear @DaiZhewei did you get the segmentation results for all classes of MVTec and computed average?

Yes, this is the average of the results across all classes.

sjtuplayer commented 2 months ago

Hi Author, We've trained 15 UNets using the provided generated data and conducted segmentation tests. We would like to ask you some questions:

The provided data only contains 500 image-mask pairs per defect class, which contradicts the 1000 pairs mentioned in the paper.

The resolution of the provided data is 256, while the stable-diffusion model directly generates images with a resolution of 512. How were the 256 resolution images obtained?

The segmentation results of the UNet we trained using the provided data on the test set are significantly different from those reported in the paper. Could you please provide insight into what might be causing this? 【Results in the paper】 Image Level AUROC: 99.2 AP: 99.7 F1: 98.7 Pixel Level AUROC: 99.1 AP: 81.4 F1: 76.3 【Results we implemented】 Image Level AUROC: 94.92 AP: 97.54 F1: 93.83 Pixel Level AUROC: 95.46 AP: 64.31 F1: 64.21 Looking forward to your reply.

We abandoned 500 image-pairs with lower quality as illustrated in the reamdme. If you need the other 500 images, you can use the offered pretrained model to generate them
The data is downsampled into 256 size for the downstream anomaly detection tasks.
The AP score should not be so bad, if you use both the anomaly detection code and generated data, it should be much higher, you can try it again, since we have already checked the performance when running the open-sourced code.

sjtuplayer / anomalydiffusion

Question about training UNet with the provided generated images #38