zzwjames / DPGBA

8 stars 0 forks source link

Questions Regarding DPBGA's Performance with Different Target Classes and Defenses #2

Open Joney-Yf opened 2 months ago

Joney-Yf commented 2 months ago

I've been working with DPBGA and have encountered some issues that I'd like to clarify:

ASR Drops to Zero with Different Target Class:

When I change the target class (e.g., to Flickr), the Attack Success Rate (ASR) drops to 0%. Is this expected behavior?

Effectiveness Against Prune Defense:

After switching the defense strategy to pruning, I noticed the ASR decreases significantly—from over 90% down to around 10%. Does this imply that DPBGA is ineffective against pruning defenses?

Trigger Effectiveness Without Dataset Poisoning:

Even without poisoning the dataset, the trigger remains effective after training. Does this suggest that DPBGA functions more as an adversarial attack rather than a backdoor attack?

I would greatly appreciate any insights or explanations regarding these observations.

Thank you!

zzwjames commented 2 months ago

Hi,

Thanks for your questions.

Attack performance for different target classes I believe nodes from certain classes are inherently harder to attack. I believe achieving high attack performance across different classes is a topic worth studying.

Effectiveness against pruning defense Our trigger generator is flexible to incorporate homophily loss, as proposed in UGBA, enabling the triggers to remain in-distribution while also bypassing the pruning defense.

Trigger effectiveness without dataset poisoning Since GNNs utilize neighbor information, when DPGBA generates in-distribution triggers similar to the original neighbors of the target nodes, it can still result in a successful attack, even if the GNN is trained on a clean graph.

Joney-Yf commented 2 months ago

Hi,

Thanks for your questions.

Attack performance for different target classes I believe nodes from certain classes are inherently harder to attack. I believe achieving high attack performance across different classes is a topic worth studying.

Effectiveness against pruning defense Our trigger generator is flexible to incorporate homophily loss, as proposed in UGBA, enabling the triggers to remain in-distribution while also bypassing the pruning defense.

Trigger effectiveness without dataset poisoning Since GNNs utilize neighbor information, when DPGBA generates in-distribution triggers similar to the original neighbors of the target nodes, it can still result in a successful attack, even if the GNN is trained on a clean graph.

Hello, thanks for your reply.

I observed that by adding homo_loss, the attack is able to bypass Prune defenses effectively, similar to the approach described in the UGBA paper.

However, during testing, instead of pruning before training, I applied an Out-Of-Distribution (OOD) defense using the reconstruct_prune function and removed the top 15% of anomalous edges from the generated poison samples. The results were as follows:

Total Overall ASR: 0.0174 Total Clean Accuracy: 0.8370 Does this indicate that DPGBA becomes ineffective when the OOD defense strength is increased?

Additionally, I noticed that the generated triggers carry target class information, which influences feature aggregation. This seems to blur the distinction between backdoor and adversarial attacks, as backdoors typically require injecting specific patterns through poisoning. However, in practice, attacks can succeed without injecting such patterns during training, which appears to contradict the fundamental premise of backdoor attacks.

Could you provide any insights on whether the reduced ASR under stronger OOD defenses suggests a limitation of DPGBA, and how the characteristics of trigger generation and feature aggregation in GNNs influence the differentiation between backdoor and adversarial attacks?

Thanks again

zzwjames commented 2 months ago

Thanks for your questions!

  1. We assume that OOD defense in real applications will not involve removing 15% of edges, as outliers typically constitute only a small portion of the total data. That's why, in our paper, we set the percentage to 3%.
  2. To be honest, I have not considered the differentiation between our backdoor attack and adversarial attacks. Our intuition is to generate in-distribution triggers. In the image domain, there are some similar backdoor attack methods. For example, if the attacker wants the model to predict an image as 'hat,' they may add a 'hat' to the image.
Joney-Yf commented 2 months ago

Thanks for your questions!

  1. We assume that OOD defense in real applications will not involve removing 15% of edges, as outliers typically constitute only a small portion of the total data. That's why, in our paper, we set the percentage to 3%.
  2. To be honest, I have not considered the differentiation between our backdoor attack and adversarial attacks. Our intuition is to generate in-distribution triggers. In the image domain, there are some similar backdoor attack methods. For example, if the attacker wants the model to predict an image as 'hat,' they may add a 'hat' to the image.

Thank you so much for your time and insights. I truly appreciate your work, and I’m just hoping to have a friendly discussion to clarify some points.

For the first question, what I’m curious about is that, when I isolate (the operation in the source code regarding outliers essentially removes all edges connected to points that exceed the threshold, which means isolating those points, correct?), about 15% of the nodes, I end up isolating almost all the trigger nodes, which leads to the attack result I mentioned earlier. I understand that this is not common in real-world scenarios, but it does occur in the source code you provided because, in your Cora testing setup, the attack nodes make up around 5%, and inserting 3 nodes per attack node leads to trigger nodes comprising about 15% of the total. When I perform OD defense again before testing ASR and CA, most of these trigger nodes are filtered out. This suggests that while trigger nodes may not fall within the top 3% outliers, most of them are within the 3%-15% range. Do you think this conclusion is reasonable?

As for the second point, I understand your perspective, and it’s not a big deal. I just wanted to add my thoughts for further clarification. my understanding is that the fundamental difference between backdoor attacks and adversarial attacks lies in whether they modify the data or model to inject a backdoor pattern. Adversarial attacks, on the other hand, explore the model’s weights to find patterns that lead the model to predict the desired label. In the former, the trigger is under the attacker’s control, and any trigger can be used, while in the latter, it’s model-specific and not controlled by the attacker. When I ran your code, I noticed that the attack was still successful even if the target model was trained entirely on clean data, which led me to think that the attack might align more closely with an adversarial attack because you don't have the process of inserting the backdoor into the victim model but can still attack successfully. Regarding the example you mentioned, in the case of images, triggers are typically features that do not overlap with the target class, such as a patch or Gaussian noise, which turn an unrelated image into the target label. But your example of adding a hat and predicting it as a hat seems less related to backdoor attacks; even a normal model could likely do that. If you’re interested, I’d be happy to discuss this further. But again, I understand this may not be a major issue.