Closed LukasBommes closed 2 years ago
What's your folder structure? In MVTec, it's
my_dataset # root folder
bottle # object type
train # path: samples for training
good # folder : contains only good sample for train
test # path: sample for testing
good # folder: contain only good sample for test
defected_type_1 # folder: defected images
detected_type_2 # folder defected images
.....
ground_truth # folder: contains binary mask
defected_type_1_mask # folder: binary mask
defected_type_2_mask # folder: binary mask
.....
Thanks for your reply!
To rule out the folder dataset as the error source, I reproduced the same issue using the MVTec dataset and default model configs. Once, I train on the provided ground truth masks and once on zero masks. Again, when training on the zero mask, the heatmap is scaled differently.
Should the output heatmap not be independent of the ground truth mask? From my understanding, it should only be used to compute the evaluation scores, correct?
My zero masks are simply created with np.zeros((900,900.3)).astype(np.uint8)
and then stored with cv2.imwrite
as PNGs.
Should the output heatmap not be independent of the ground truth mask? From my understanding, it should only be used to compute the evaluation scores, correct?
relevant issue. relevant issue
My zero masks are simply created with np.zeros((900,900.3)).astype(np.uint8) and then stored with cv2.imwrite as PNGs.
I'm wondering how zero maks can play a role here.
@samet-akcay
I'm trying to understand the workflow of the anomaly detection task. If I understand, we only need good samples for training And in testing time, there are good and bad samples with ground truth masks - like in the mv-tec dataset. In some cases, the program will adaptively pick a threshold. My custom dataset structure is as follows:
bottle # folder name with object type
train # path: samples for training
good # folder: contains only good samples for train
test # path: sample for testing
good # folder: contain only good samples for the test
defected_type_1 # folder: defected images
ground_truth # folder: contains the binary mask
defected_type_1_mask # folder: binary mask
I'm not sure how to change the config with this. For now, I change the config as follows, with the test set only; whereas it should be train set.
dataset:
name: mvtec #options: [mvtec, btech, folder]
format: folder # mvtec
path: C:/Users/..../bottle/test # ./datasets/MVTec
normal: good
abnormal: defected_type_1
split_ratio: 0.2
seed: 0
task: segmentation
mask: C:/Users/..../bottle/ground_truth/defected_type_1
extensions: null
# category: bottle
image_size: 224
train_batch_size: 32
test_batch_size: 1
num_workers: 1 # 36
transform_config: null
create_validation_set: false
tiling:
apply: false # true
tile_size: null # 64
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16
If I don't have the ground truth, how should I change the config for the custom dataset? Also, any tips about the best practices to get optimal results?
At test time, the model normalizes both the anomaly map and the score https://github.com/openvinotoolkit/anomalib/blob/8c1a04fc7fbd95fae1dac6cb0641a7381ec8e5d4/anomalib/deploy/inferencers/torch.py#L154
The values used for this normalization are obtained during validation, but can also be provided by you in the meta data https://github.com/openvinotoolkit/anomalib/blob/8c1a04fc7fbd95fae1dac6cb0641a7381ec8e5d4/anomalib/deploy/optimize.py#L30-L55
So the best way would be to check your models meta data and adjust if needed..
Hi @LukasBommes and @innat, it's because you guys set the adaptive threshold to True
. It calculates the threshold based on the best f1 score based on the prediction and annotations. When you provide zero masks, the model assumes that this is a segmentation task and computes the threshold based on the zero-masks and get confused. That's why you see wrong heatmap as the output.
If you want to use a custom dataset you should use folder dataset. Here is sample config how to use it
If I don't have the ground truth, how should I change the config for the custom dataset? Also, any tips about the best practices to get optimal results?
@innat, if you don't have the mask annotations, you could set the mask
path to null
. Something like the following:
dataset:
name: mvtec #options: [mvtec, btech, folder]
format: folder # mvtec
path: C:/Users/..../bottle/test # ./datasets/MVTec
normal: good
abnormal: defected_type_1
split_ratio: 0.2
seed: 0
->task: classification
->mask: null
extensions: null
# category: bottle
image_size: 224
train_batch_size: 32
test_batch_size: 1
num_workers: 1 # 36
transform_config: null
create_validation_set: false
tiling:
apply: false # true
tile_size: null # 64
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16
Since there is no mask annotations, the task would become classification
. It's currently set manually, but we'll be automatically adjusting this
@samet-akcay Thank you. My current dataset is kind of a mess (some have ground truth and some don't). I need to sort them out properly.
Could you please inform me how should I update the config with the following folder structure? It's like mv-tec, good samples for training. And, good + defected samples for testing/evaluation. It has the ground truth binary mask.
dummy_folder_name # folder name with object type
train # path: samples for training
good # folder: contains only good samples for train
test # path: sample for testing
good # folder: contain only good samples for the test
defected_type_1 # folder: defected images
defected_type_2 # folder: defected images
.....
ground_truth # folder: contains the binary mask
defected_type_1_mask # folder: binary mask
defected_type_2_mask # folder: binary mask
.....
I mostly confuse the following params:
name: any_name_ #options: [mvtec, btech, folder]
format: folder # mvtec
path: ?
normal: ?
abnormal: ?
task: segmentation
mask: ?
Lastly, the num_workers
param, by default is set to 36
. I think it should be adaptive, like computing the cores of the running system.
I would place defected_type_1
and defected_type_2
directories into defect
folder. Otherwise, the folder dataset may not find both of the directories. Something like
dummy_folder_name # folder name with object type
train # path: samples for training
good # folder: contains only good samples for train
test # path: sample for testing
good # folder: contain only good samples for the test
defect
defected_type_1 # folder: defected images
defected_type_2 # folder: defected images
.....
ground_truth # folder: contains the binary mask
defect
defected_type_1_mask # folder: binary mask
defected_type_2_mask # folder: binary mask
.....
name: dummy_folder_name
format: folder
path: path/to/dummy_folder_name
normal: good
abnormal: defect
task: segmentation
mask: path/to/dummy_folder_name/ground_truth/defect
I'm not 100% sure though, I need to double check this.
Another alternative to folder dataset parameters would be to remove path
and only have normal_path
, abnormal_path
and mask_path
. This needs some discussion though
Okay, so in the config setup above
name: dummy_folder_name
format: folder
path: path/to/dummy_folder_name
normal: good # i think it should be 'train/good'
abnormal: defect # and it should be 'test/defect`
task: segmentation
mask: path/to/dummy_folder_name/ground_truth/defect
If so, where I should set the path for path/to/dummy_folder_name/test/good
? Is good sample for the test needed?
argh, just noticed it now. folder
format assumes that the dataset doesn't have train/test splits. Your dataset, however, already split into train/test. In this case, folder
may not be the best option. You might want to use mvtec
format.
Examples in the docstring might be helpful to understand how FolderDataset
works
https://github.com/openvinotoolkit/anomalib/blob/development/anomalib/data/folder.py#L316
Thanks for all your inputs on this and good to know that the issue with the heatmap is expected behaviour. I am also looking forward to your new method for setting the threshold in an unsupervised manner.
@samet-akcay: I tried a custom folder dataset and set task: classification
and mask: null
. But this also gives me the KeyError: label
reported here.
My dataset structure is as follows:
anomalib/datasets/mvtec_test
|-- normal
| |-- 000.png
| |-- 001.png
| |-- ...
|-- abnormal
| |-- 000.png
| |-- 001.png
| |-- ...
and this is the complete config
dataset:
name: mvtec_test
format: folder
path: ./datasets/mvtec_test
normal: normal
abnormal: abnormal
task: classification
mask: null
extensions: null
split_ratio: 0.2
seed: 0
image_size: 224
train_batch_size: 32
test_batch_size: 1
num_workers: 16
transform_config: null
create_validation_set: true
tiling:
apply: false
tile_size: null
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16
model:
name: padim
backbone: resnet18
layers:
- layer1
- layer2
- layer3
metric: auc
normalization_method: min_max # options: [none, min_max, cdf]
threshold:
image_default: 3
pixel_default: 3
adaptive: true
project:
seed: 42
path: ./results
log_images_to: ["local"]
logger: false
save_to_csv: false
optimization:
openvino:
apply: false
# PL Trainer Args. Don't add extra parameter here.
trainer:
accelerator: null
accumulate_grad_batches: 1
amp_backend: native
auto_lr_find: false
auto_scale_batch_size: false
auto_select_gpus: false
benchmark: false
check_val_every_n_epoch: 1 # Don't validate before extracting features.
checkpoint_callback: true
default_root_dir: null
deterministic: false
fast_dev_run: false
gpus: 1
gradient_clip_val: 0
limit_predict_batches: 1.0
limit_test_batches: 1.0
limit_train_batches: 1.0
limit_val_batches: 1.0
log_every_n_steps: 50
log_gpu_memory: null
max_epochs: 1
max_steps: -1
min_epochs: null
min_steps: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle
num_nodes: 1
num_processes: 1
num_sanity_val_steps: 0
overfit_batches: 0.0
plugins: null
precision: 32
prepare_data_per_node: true
process_position: 0
profiler: null
progress_bar_refresh_rate: null
replace_sampler_ddp: true
stochastic_weight_avg: false
sync_batchnorm: false
terminate_on_nan: false
tpu_cores: null
track_grad_norm: -1
val_check_interval: 1.0 # Don't validate before extracting features.
weights_save_path: null
weights_summary: top
Thanks for fixing this. I'll give it a try in the afternoon.
I ran into an issue when training on a dataset (subset of MVTec), where I set all ground truth masks to zero (to simulate training on a dataset for which I have no ground truth masks). When training with the actual ground truth masks, the model produces heatmaps as expected as in the first image below (produced with
tools/inference.py
). However, when training with the zero masks, the heatmaps seem to be scaled differently as in the second image below. The confidence score seems unaffected.This behaviour is the same for both PADIM and PatchCore. I haven't tested the other models.
This is my model config for PADIM