How to maintain the aspect ratio of the image?

openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.

https://anomalib.readthedocs.io/en/latest/

Apache License 2.0

3.63k stars 646 forks source link

How to maintain the aspect ratio of the image? #1500

Closed ArivCR7 closed 8 months ago

ArivCR7 commented 9 months ago

Hi, The images in my dataset are from the line scan cameras and they have a very high aspect ratio. e.g., 200x5000 (wxh). If I set the image_size as 256 in the config.yaml, it gets deformed. How do I maintain the aspect ratio for these kinds of images?

j99ca commented 9 months ago

You should be able to set the image_size with a custom resolution (I haven't tried this on all model/configs). Something like: image_size: [256, 320] Or perhaps you can preprocess your data before hand in smaller chunks

blaz-r commented 9 months ago

As @j99ca said you can set dimensions for both width and height in config. You could also try tilling functionality (that only works for some models) to tile the image.

ArivCR7 commented 9 months ago

Thanks for your suggestions @j99ca and @blaz-r. I'll work it out and keep this thread posted on the results.

ArivCR7 commented 9 months ago

@blaz-r , I enabled tiling the config. Even after enabling it, during inference, the image gets resized to 'image_size' set in the config. My question is, if the tiler splits the image into patches before feeding to the model, what's the significance of image_size in the config? Kindly clarify.

Here's my config: dataset: name: boxdefect format: folder path: xxx/xxx # dataset resides here normal_dir: good abnormal_dir: bad normal_test_dir: null task: classification mask: null extensions: null train_batch_size: 1 eval_batch_size: 1 num_workers: 8 image_size: 256 # dimensions to which images are resized (mandatory) center_crop: null # dimensions to which images are center-cropped after resizing (optional) normalization: none # data distribution to which the images will be normalized: [none, imagenet] transform_config: train: null eval: null test_split_mode: none # options: [from_dir, synthetic] test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode) val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic] val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode) tiling: apply: true tile_size: 100 stride: 100 remove_border_count: 0 use_random_tiling: False random_tile_count: 16

blaz-r commented 9 months ago

Tiler is implemented as part of the model. This means that the image is fed to the model in the shape of image_size. Inside the model forward pass, tiler is used to tile the image into tile_size tiles.

ArivCR7 commented 9 months ago

Thanks for the reply @blaz-r , In that case, how do we deal with images with different aspect ratios. Setting a fixed resolution as the image_size might deform the images. Any suggestion on this please.

blaz-r commented 9 months ago

In that case, I'm not sure. The way most of the models work, I'd say you'd want a fixed aspect ratio for them to work best. If some other task has a different aspect ratio it would probably be best to crop the image to the same ratio or if that's not possible train a separate model. If the difference in ratio is small, then it shouldn't be that much of a problem to use the same resolution.

ArivCR7 commented 9 months ago

Thanks for you suggestion @blaz-r . I'm trying to do tiling before-hand to maintain the aspect ratio and feed the patches to the model instead of whole image. Will share the results in this thread. Once again, thanks for your inputs @blaz-r and @j99ca .

ArivCR7 commented 9 months ago

As mentioned, I did tiling before-hand and fed the patches to the network. It works fine and able to detect defects if both the good and bad samples are uniform. However, in my case, I will encounter patches with different patterns(say with texts with different chars and orientations and with barcodes of diff. sizes and orientations). The way I thought through is - train a separate model for each kind of a patch. Can anomalib library be useful for such cases? I've attached samples for reference. @blaz-r kindly share your thoughts on this. 231115163616_A_BM_6388_19 231115163616_A_BM_6388_7 231115162839_A_TP_6282_130 231115162839_A_TP_6282_74 231115163512_A_TP_6369_30 231115162736_A_TP_6270_74 231115163003_A_BM_6298_76 231115163437_A_BM_6357_51

ArivCR7 commented 9 months ago

What I observed from the MVTec dataset is, the good samples are all belong to the ideal condition of one single product with not much variation of patterns in it. In my case, the good samples might contain different textures/patterns such as texts, labels in it as shared above. Can anomalib be used for such scenarios? Kindly share your expertise on this @blaz-r @j99ca

blaz-r commented 9 months ago

Hello. Currently the tiled ensemble, where each patch has a separate model, is still in development. Regarding the different orientation of labels. While models are usually not sensitive to orientation, I think it's a bit more tricky with text as text is more detailed. So I think that for text you could either align it, or use a different solution like OCR.

ArivCR7 commented 9 months ago

Thanks for the reply @blaz-r , I don't want to do OCR as the text on the patch is not important. All I need is to be able to detect any defects on the patch with different kinds of texts. Thoughts?

blaz-r commented 9 months ago

My idea was that if you can read the text you can check it in text form to see if it's okay. If content is not important, then I assume you can use Anomalib models, but some models work a bit better if you can align the text.

ArivCR7 commented 9 months ago

@blaz-r , Once again thanks for the reply. Not all the patches will have text in it. Some patches as I shared above will have some printed diagrams in it. Shared a sample here. 231115163414_A_TP_6351_62

blaz-r commented 9 months ago

It should still work, but I think it will work best if same model always seems same patch that is also aligned.

ArivCR7 commented 8 months ago

Hi @blaz-r , I tried to train a padim model for this use case. The model gets trained if the dataset size is small(~1000 good images and ~100 bad images). However, if I increase the dataset size, the training gets killed. I referred a similar issue - https://github.com/openvinotoolkit/anomalib/issues/630. So I'm planning to train a different model than padim. Which one do you think will be a good fit to detect anomalies in the samples I shared above, as there are many models to try out. Thanks!

blaz-r commented 8 months ago

Hi. I think for such large datasets you should use models that don't utilize memory banks. So I'd say that you should avoid Padim and Patchcore (I think also CFA could be problematic, as well sa dfm and dfkde, but can't say for sure). You could try cflow, reverse distillation, efficient_ad and stfpm. I'd say that it should be okay as long as the model doesn't use memory bank, but still try different architectures as some might work better.

ArivCR7 commented 8 months ago

Thanks for the suggestion @blaz-r . I'll post the results on this thread.