The gradient of MLP and GCN is None

WallofWonder commented 1 year ago

I added some code in train_step() to monitor the gradient of parameters：

...
...
loss.backward()

# print all gradient
for name, weight in model.named_parameters():
    if weight.requires_grad:
        print(name + "\tweight.grad:" + str(weight.grad))

optimizer.step()

return loss_val

And I found the parameters of gcn1.conv, gcn2.conv, mlp1 and mlp2 have a gradient of None Here is part of console output:

...
...
ds_0.gcn1.conv.0.weight weight.grad:None
ds_0.gcn1.conv.0.bias   weight.grad:None
ds_0.gcn1.conv.1.weight weight.grad:None
ds_0.gcn1.conv.1.bias   weight.grad:None
ds_0.gcn2.conv.0.weight weight.grad:None
ds_0.gcn2.conv.0.bias   weight.grad:None
ds_0.gcn2.conv.1.weight weight.grad:None
ds_0.gcn2.conv.1.bias   weight.grad:None
...
...
ds_0.mlp1.weight    weight.grad:None
ds_0.mlp1.bias  weight.grad:None
ds_0.mlp2.weight    weight.grad:None
ds_0.mlp2.bias  weight.grad:None
...
...

And I added the same code in CLNet, and it doesn't have this issue. I don't know if this will affect performance. Moreover, when I tried to train the model with multiple GPUs, this issue became an obstacle for me.

xinliu29 commented 1 year ago

I'm really sorry, but I currently don't know the reason behind this. If you want to train the network using multiple GPUs, you can add the code "model = torch.nn.DataParallel(model) model.cuda()" in the main.py file.

WallofWonder commented 1 year ago

I'm really sorry, but I currently don't know the reason behind this. If you want to train the network using multiple GPUs, you can add the code "model = torch.nn.DataParallel(model) model.cuda()" in the main.py file.

Anyway, thank you for your prompt reply and great work.😀 I'm trying to figure it out.

xinliu29 / NCMNet

The gradient of MLP and GCN is None #4