After testing different attack models, it is found that some can not be detected, what is the possible reason?

wp275006311 commented 2 months ago

Description: 1, attack with open-source library (https://github.com/THUYimingLi/BackdoorBox), which generate such as badnet and wavnet attack model, attack sucess rate of 95%, but use UMD detection algorithm almost attack category can detect errors and threshold is very low, What may be the cause of this, but with other detection algorithms, it can be detected normally, but only all2one can be detected 2, use open source attack library (https://github.com/SCLBD/BackdoorBench), which generate such as badnet and wavnet attack model, can detect the normal attack category, accuracy is not high

If the backdoor attack algorithm in UMD is used, the attack model is generated and the detection performance is high

zhenxianglance commented 2 months ago

Thanks for raising this issue. We haven’t tested with BackdoorBox or BackdoorBench before, so we do not know the extract reason. Here is our hypothesis.

The detection power of UMD depends on both the effectiveness of trigger reverse engineering and the group-wise transferability of the reverse engineered trigger. One possibility is that the reverse engineered trigger does not transfer well. To test this, we suggest plotting the heat map for transferability like the one in our paper. If the heat map is dark everywhere, please use more samples for trigger reverse engineering. But if the heat map is bright everywhere on the diagonal, please use fewer samples.

If the above does not solve the issue, we suggest using alternative (more recent) trigger reverse engineering approaches, such as UNICORN.

please let us know if you have further questions.

wp275006311 commented 2 months ago

Hello, thank you very much for your busy reply. According to your suggestion, I used cifar10 data set to modify the number of pictures from 5, 10, 15, 20 and 25, but they all failed to meet the expectations, and there was still a big difference between them. A strange phenomenon is found, that is, when the UDM detection algorithm detects that target is 1, the TR value will be abnormally large. The following is an all2one backdoor attack model, the data set is cifar10, using the simplest badnet attack, repeatedly check the backdoor attack model, no abnormal situation was found, model accuracy and attack success rate are as follows: '’test_acc': 0.8777, 'test_asr': 0.9998888888888889, 'test_ra': 0.00011111111111111112, 'train_acc': 0.9337740384615385, 'train_acc_clean_only': 0.92956545830403, 'train_asr_bd_only': 0.9996663329996663, The umd detection algorithm is used, the number of images is 10, pert is used in mode, other parameters are consistent with UMD, and the attack target category is 2. That is, the expected detected poison pairs are [02 12 32 42 52 62 72 82 92], but the actual pairs are [51 31 81 01 21 41 61 71]. Please check whether anomalies can be found in the existing logs, and check the logs and color map as attached. In addition, for a backdoor attack model to be detected, should patch or pert be selected as the mode of detection algorithm? Any suggestions? color_map_all log.txt

polaris-73 / UMD-backdoor-detection

After testing different attack models, it is found that some can not be detected, what is the possible reason? #1