xiaozhen228 / VCP-CLIP

(ECCV 2024) VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
51 stars 0 forks source link

About auxiliary datasets #5

Open Looksing opened 2 months ago

Looksing commented 2 months ago

Hi, firstly thank you for your amazing work, I would like to use it to detect defects on complex surfaces, but no matter what weights(train_visa.pth or train_mvtec.pth) I use, the results are not good.For example,sometimes, surface components are mistaken for defects. In this case, how should I choose the auxiliary training set to train? Thank you so much, I am looking forward to your reply. vis_zong_mvtec_good_000095

xiaozhen228 commented 2 months ago

Hi, firstly thank you for your amazing work, I would like to use it to detect defects on complex surfaces, but no matter what weights(train_visa.pth or train_mvtec.pth) I use, the results are not good.For example,sometimes, surface components are mistaken for defects. In this case, how should I choose the auxiliary training set to train? Thank you so much, I am looking forward to your reply. vis_zong_mvtec_good_000095

Hi, for certain complex product categories, zero-shot anomaly segmentation is challenging, as some anomalies can only be defined based on normal samples. Therefore, for the provided samples above, if practical application is desired, similar samples can be added to an auxiliary dataset for supervised prompt learning, which makes the training process highly efficient.

1TTT9 commented 2 months ago

First of all, thank you for your outstanding work. As you pointed out, anomaly detection in complex scenarios, like the image Looksing uploaded, poses significant challenges for pretrained models, leading to the need for supervised prompt learning to mitigate suboptimal results. I have a few follow-up questions:

  1. Is there any possible to have textual explanation provided by the ZSAS model to justify its output?
  2. When generating our auxiliary dataset, how can we assess whether the prompts are effective or arbitrary? I noticed that class names from the dataset were used, but this approach seems non-generalizable, as class names in datasets like MVTec or VISA may overlap yet have different meanings across datasets.
  3. We observed that the attribute "specie_name" is not referenced during either training or testing. Could you explain why? Does this suggest that the ZSAS model performs defect segmentation without associating a textual description of the defect?
  4. Lastly, when the ZSAS model’s performance declines and we lack sufficient diverse anomaly samples, the results appear to approximate those of unsupervised learning methods. Could you provide insight into this observation? Once again, thank you for your exceptional work, and we look forward to your response.