2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset.
The pipeline of Multi-3D-Memory (M3DM).
Our M3DM contains three important parts: (1) Point Feature Alignment (PFA) converts Point Group features to plane features with interpolation and project operation, $\text{FPS}$ is the farthest point sampling and $\mathcal{F{pt}}$ is a pretrained Point Transformer; (2) Unsupervised Feature Fusion (UFF) fuses point feature and image feature together with a patch-wise contrastive loss $\mathcal{L{con}}$, where $\mathcal{F{rgb}}$ is a Vision Transformer, $\chi{rgb},\chi_{pt}$ are MLP layers and $\sigma_r, \sigma_p$ are single fully connected layers; (3) Decision Layer Fusion (DLF) combines multimodal information with multiple memory banks and makes the final decision with 2 learnable modules $\mathcal D_a, \mathcal{Ds}$ for anomaly detection and segmentation, where $\mathcal{M{rgb}}$, $\mathcal{M{fs}}$, $\mathcal{M{pt}}$ are memory banks, $\phi, \psi$ are score function for single memory bank detection and segmentation, and $\mathcal{P}$ is the memory bank building algorithm.We implement this repo with the following environment:
Install the other package via:
pip install -r requirement.txt
# install knn_cuda
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
# install pointnet2_ops_lib
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
The MVTec-3D AD
dataset can be download from the Official Website of MVTec-3D AD.
The Eyecandies
dataset can be download from the Official Website of Eyecandies.
After download, put the dataset in dataset
folder.
To run the preprocessing
python utils/preprocessing.py datasets/mvtec3d/
It may take a few hours to run the preprocessing.
The following table lists the pretrain model used in M3DM:
Backbone | Pretrain Method |
---|---|
Point Transformer | Point-MAE |
Point Transformer | Point-Bert |
ViT-b/8 | DINO |
ViT-b/8 | Supervised ImageNet 1K |
ViT-b/8 | Supervised ImageNet 21K |
ViT-s/8 | DINO |
UFF | UFF Module |
Put the checkpoint files in checkpoints
folder.
Train and test the double lib version and save the feature for UFF training:
mkdir -p datasets/patch_lib
python3 main.py \
--method_name DINO+Point_MAE \
--memory_bank multiple \
--rgb_backbone_name vit_base_patch8_224_dino \
--xyz_backbone_name Point_MAE \
--save_feature \
Train the UFF:
OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=1 fusion_pretrain.py \
--accum_iter 16 \
--lr 0.003 \
--batch_size 16 \
--data_path datasets/patch_lib \
--output_dir checkpoints \
Train and test the full setting with the following command:
python3 main.py \
--method_name DINO+Point_MAE+Fusion \
--use_uff \
--memory_bank multiple \
--rgb_backbone_name vit_base_patch8_224_dino \
--xyz_backbone_name Point_MAE \
--fusion_module_path checkpoints/{FUSION_CHECKPOINT}.pth \
Note: if you set --method_name DINO
or --method_name Point_MAE
, set --memory_bank single
at the same time.
If you find this repository useful for your research, please use the following.
@inproceedings{wang2023multimodal,
title={Multimodal Industrial Anomaly Detection via Hybrid Fusion},
author={Wang, Yue and Peng, Jinlong and Zhang, Jiangning and Yi, Ran and Wang, Yabiao and Wang, Chengjie},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={8032--8041},
year={2023}
}
Our repo is built on 3D-ADS and MoCo-v3, thanks their extraordinary works!