Closed sbonner0 closed 2 years ago
It's not officially supported. Our custom CUDA kernels currently do not run with half-precision, but we have plans to support that, see here. Your error is not related to half-precision though. Does AMP index with torch.int32 tensors?
Thanks for your reply! So it seems AMP was converting the indices into half precision floats as well -- as you suggested. I was able to get around it by explicitly casting the indices to longs. The following code is able to run with all optimisation levels at least:
import torch
import torch.nn as nn
import torch.nn.functional as F
from apex import amp
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv
dataset = Planetoid(root='/tmp/Cora', name='CiteSeer')
class Net(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = GCNConv(dataset.num_node_features, 32)
self.conv2 = GCNConv(32, dataset.num_classes)
def forward(self, data):
data_x, edge_index = data.x, data.edge_index.long()
x = F.relu(self.conv1(data_x, edge_index))
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
# Initialization
opt_level = 'O2'
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.train()
for epoch in range(500):
optimizer.zero_grad()
out = model(data)
loss = F.nll_loss(out[data.train_mask.long()], data.y[data.train_mask.long()].long())
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
optimizer.step()
Should I trust it is being trained correctly however if half precision is not explicitly supported? Thanks for your link to the pull request - if you need any help testing it, please do let me know!
@sbonner0
I'm dealing with this exact situation currently. I'm not sure, but you may also run into a situation where the indices are changing when the edge_index
tensor is passed into the model, because it converts the whole data
list into float16
. Have you noticed this issue? You can solve this by passing each tensor into the model separately. Let me know if you hit this issue. Also, I'm very keen to know if you are seeing any memory/speed improvements? I'm seeing zero boost from AMP, and am struggling to find the source of this. Possibly it's the custom Pytorch Geometric CUDA kernels.
Hi @murnanedaniel thanks for your response! Are you saying that the index values themselves are being changed by the conversion to fp16? I shall investigate if this is happening for me as well as let you know.
I am also not seeing any real noticeable improvements in the speed or indeed memory usage when using AMP with Geometric. As you said, it could well be due to the custom geometric kernels not being half precision compatible at the moment - seems that will change soon with the upcoming PR.
I can't be sure, until you test your situation, but for me everything in the data
list was being converted to half
type, even int
type tensors. This led to indices being rounded up and down (according to these rules), i.e. the graph getting totally messed up. The solution was (given a data object with long
type edge_index
and float
type x
) to pass in each individually:
out = model(data.x, data.edge_index.int())
Then convert edge_index
back to long
within the forward()
. Then AMP understood which to convert and which to leave (since it doesn't touch int
tensors by design). If you find a better way to handle this, let me know!
And just a follow-up to memory usage - I'm now seeing a 50% drop in peak GPU memory usage with the O2
level of AMP, which is significant. But no speed improvements.
Hi @murnanedaniel great news on the memory usage - did you have to do anything extra in the code to get that to work? How is the accuracy at the O2
level?
Once the data types were handled correctly, I didn't have to do anything further. I hadn't seen the boost earlier as the model was quite small compared to the data, but boosting the layers and hidden features made the memory benefits more clear. The accuracy was basically unaffected with O2
, with occasional gradient overflows. But turning off the master weights, or going to O3
led to severe overflow. This may be because the GNN was quite deep? I'm still looking into that issue. It may be unsolvable.
Hey,
I've been trying to use pytorch geometric with NVIDIA AMP and having some troubles - is it officially supported?
For example the following code does not run successfully with any opt level other than O1:
For example O0, O2 and O3 produces the following error:
IndexError: tensors used as indices must be long, byte or bool tensors
Is there anyway to fix this?