Reproducibility of results

marco-rudolph / differnet

This is the official repository to the WACV 2021 paper "Same Same But DifferNet: Semi-Supervised Defect Detection with Normalizing Flows" by Marco Rudolph, Bastian Wandt and Bodo Rosenhahn.

217 stars 68 forks source link

Reproducibility of results #12

Closed d3adc0c0 closed 3 years ago

d3adc0c0 commented 3 years ago

Hello!

First of all, thank you very much for publishing your code as well as writing the paper.

I wonder how to robustly reproduce the results stated in the paper. Namely, what setup is used for the grid category from the MVTec Anomaly Detection dataset? I have tried to use the default setup and could not reach the value of 0.84, only 0.8.

Also, my training process was quite unstable (maybe due to the aggressive data augmentations by default).

I use compatible with the requirements.txt virtual environment which I do not describe for brevity but let me know if it is important.

Thank you very much for your answer in advance.

P.S. Whilst a difference by 0.04 may seem insignificant, e.g. it is the difference between your method and ideal AUC. Hence, I am quite interested is the results stated in the paper are really robust :)

marco-rudolph commented 3 years ago

Thanks for your interest! Indeed, for our method, the grid class is a bit the problem child in this dataset. In fact, the training there is very unstable, which can result in different scores each run. I don't think the environment makes a big difference.

d3adc0c0 commented 3 years ago

@marco-rudolph But you are positive that the default setup in this repo reaches 0.84 sometimes, right?

d3adc0c0 commented 3 years ago

Also, if you were so kind to explain how you processed the Magnetic Tile Defects dataset I would really appreciate it. All images in this dataset seem to have random size. I have not found these details in your paper.

Thank you very much in advance anyway.

marco-rudolph commented 3 years ago

@marco-rudolph But you are positive that the default setup in this repo reaches 0.84 sometimes, right?

Yes.

marco-rudolph commented 3 years ago

Also, if you were so kind to explain how you processed the Magnetic Tile Defects dataset I would really appreciate it. All images in this dataset seem to have random size. I have not found these details in your paper.

Thank you very much in advance anyway.

The images were just resized to 448x448, 224x224 and 112x112 pixels without any padding or cropping.

d3adc0c0 commented 3 years ago

Thank you very much for your feedback!