How to define a model is backdoored or not?

xiaoyunxxy / ban

0 stars 0 forks source link

How to define a model is backdoored or not? #1

Closed wsynuiag closed 6 days ago

wsynuiag commented 1 week ago

I understood the mechanism in ban_detection.py. However, from the results like acc, reg, how can I judge a model is backdoored or not? Specifically, what are the thresholds used in the paper under different model architectures and datasets?

xiaoyunxxy commented 6 days ago

Hi,

Similar to the BTI-DBF, we determine a model as backdoored according to whether BAN can correctly identify the target class.

For example, the prediction results of a WaNet model is:

Prediction distribution: [5000. 0. 0. 0. 0. 0. 0. 0. 0. 0.] Prediction targets to: 0

For a benign model: Prediction distribution: [631. 279. 542. 438. 691. 735. 494. 544. 459. 187.] Prediction targets to: 5

wsynuiag commented 6 days ago

Thanks for your quick reply. From the example above, will the benign model be misclassified as a backdoor model?

xiaoyunxxy commented 6 days ago

It's not impossible. However, the prediction will be evenly (relatively) distributed in each class for benign models. For backdoored models, it is concentrated in the target class.

wsynuiag commented 6 days ago

Thank you. So I need to judge it manually, or is there any automatic way (e.g. threshold, metrics) to define a backdoor model?

xiaoyunxxy commented 6 days ago

We can easily have a script to check the detection results. In our case, we use the number of samples (>2500) predicted as the target class. But in fact, we also manually checked all detection results including other baselines.

wsynuiag commented 6 days ago

I understand. Thanks for your time.