Some corners about the training of topology generator

Nice work! Thank you very much for your contribution to the AI safety community!

I noticed a weird phenomenon when training the topology generator. The code of training the topology generator is

toponet.train()    
for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"): 
    optimizer_topo.zero_grad()
    # generate new adj_list by dr.data['adj_list']
    for gid in pset:
        SendtoCUDA(gid, [init_As, Ainputs, topomasks])    # only send the used graph items to cuda
        rst_bkdA = toponet(
            Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo')
        # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo')
        # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid])
        bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid])   # only current position in cuda
        SendtoCPU(gid, [init_As, Ainputs, topomasks])

    loss = forwarding(args, bkd_dr, model, allset, criterion)
    loss.backward()
    optimizer_topo.step()
    torch.cuda.empty_cache()

toponet.eval()

When I check the parameters of the topology generator before and after the training using the following snippets, i.e.,

import copy
old_toponet = copy.deepcopy(toponet)

toponet.train()    
for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"): 
  optimizer_topo.zero_grad()
  # generate new adj_list by dr.data['adj_list']
  for gid in pset:
      SendtoCUDA(gid, [init_As, Ainputs, topomasks])    # only send the used graph items to cuda
      rst_bkdA = toponet(
          Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo')
      # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo')
      # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid])
      bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid])   # only current position in cuda
      SendtoCPU(gid, [init_As, Ainputs, topomasks])

  loss = forwarding(args, bkd_dr, model, allset, criterion)
  loss.backward()
  optimizer_topo.step()
  torch.cuda.empty_cache()
toponet.eval()

new_toponet = copy.deepcopy(toponet)

old_state_dict = old_toponet.state_dict()
new_state_dict = new_toponet.state_dict()
for name in old_state_dict:
  param_diff = new_state_dict[name] - old_state_dict[name]
  print(torch.mean(param_diff))

I found there is no difference in parameters after training. The log is as follows:

N nodes avg/std/min/max:        15.69/13.69/2/95
N edges avg/std/min/max:        16.20/15.01/1/103
Node degree avg/std/min/max:    2.06/0.84/0/6
Node features dim:              4
N classes:                      2
Classes:                        [0 1]
Class 0:                        400 samples
Class 1:                        1600 samples

train 1000, test 1000
Train Epoch: 1  Loss: 0.3501 (avg: 0.6249)      sec/iter: 0.09
Train Epoch: 2  Loss: 0.4396 (avg: 0.4671)      sec/iter: 0.04
Train Epoch: 3  Loss: 0.3962 (avg: 0.4762)      sec/iter: 0.05
Train Epoch: 4  Loss: 0.2415 (avg: 0.4725)      sec/iter: 0.04
Train Epoch: 5  Loss: 0.3413 (avg: 0.4318)      sec/iter: 0.05
Test set (epoch 5): Average loss: 0.3149, Accuracy: 936/1000 (93.60%)   sec/iter: 0.04
Train Epoch: 6  Loss: 0.1591 (avg: 0.4509)      sec/iter: 0.05
Train Epoch: 7  Loss: 0.2189 (avg: 0.4338)      sec/iter: 0.05
Train Epoch: 8  Loss: 0.3262 (avg: 0.4374)      sec/iter: 0.05
Train Epoch: 9  Loss: 0.4319 (avg: 0.4283)      sec/iter: 0.05
Train Epoch: 10 Loss: 0.2932 (avg: 0.4221)      sec/iter: 0.04
Test set (epoch 10): Average loss: 0.2969, Accuracy: 949/1000 (94.90%)  sec/iter: 0.04
Train Epoch: 11 Loss: 0.3764 (avg: 0.4185)      sec/iter: 0.04
Train Epoch: 12 Loss: 0.3095 (avg: 0.4208)      sec/iter: 0.05
Train Epoch: 13 Loss: 0.2180 (avg: 0.3867)      sec/iter: 0.04
Train Epoch: 14 Loss: 0.3225 (avg: 0.3997)      sec/iter: 0.05
Train Epoch: 15 Loss: 0.2932 (avg: 0.4269)      sec/iter: 0.04
Test set (epoch 15): Average loss: 0.2962, Accuracy: 953/1000 (95.30%)  sec/iter: 0.04
Train Epoch: 16 Loss: 0.2085 (avg: 0.3804)      sec/iter: 0.04
Train Epoch: 17 Loss: 0.3577 (avg: 0.4243)      sec/iter: 0.04
Train Epoch: 18 Loss: 0.2417 (avg: 0.3843)      sec/iter: 0.04
Train Epoch: 19 Loss: 0.2875 (avg: 0.3822)      sec/iter: 0.04
Train Epoch: 20 Loss: 0.2741 (avg: 0.3581)      sec/iter: 0.05
Test set (epoch 20): Average loss: 0.2789, Accuracy: 955/1000 (95.50%)  sec/iter: 0.04
initializing trigger...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 4332.37it/s]
initializing trigger...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 19990.96it/s]
Resampling step 0, bi-level optimization step 0
training topology generator: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:09<00:00,  6.46s/it]
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')

Could you give me some suggestions about this problem? Thank you very much for any replies! :)

zhaohan-xi / GraphBackdoor

Some corners about the training of topology generator #10