Open HanChen-HUST opened 4 years ago
You need to use torch_geometric.nn.DataParallel
. You can find an example here.
thank you ,i change it with qm9,but it occurs with another error:AttributeError: 'tuple' object has no attribute 'num_nodes'
Can you show me a minimal example to reproduce this?
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2,3"
import torch
import torch.optim as optim
import torch.nn as nn
from torch_geometric.data import DataLoader
import torch_geometric.transforms as T
from modelmof import mof
import os.path as osp
device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
from torch_geometric.utils import remove_self_loops
import torch
import torch.nn.functional as F
from torch.nn import Sequential, Linear, ReLU, GRU
import torch_geometric.transforms as T
from torch_geometric.nn import NNConv, Set2Set
from torch_geometric.data import DataLoader
from torch_geometric.utils import remove_self_loops
import numpy as np
dim=64
from torch_geometric.nn import DataParallel
class MyTransform(object):
def call(self, data):
data.y = data.y[:, target]
return data
class Complete(object): def call(self, data): device = data.edge_index.device row = torch.arange(data.num_nodes, dtype=torch.long, device=device) col = torch.arange(data.num_nodes, dtype=torch.long, device=device) row = row.view(-1, 1).repeat(1, data.num_nodes).view(-1) col = col.repeat(data.num_nodes) edge_index = torch.stack([row, col], dim=0) edge_attr = None if data.edge_attr is not None: idx = data.edge_index[0] data.num_nodes + data.edge_index[1] size = list(data.edge_attr.size()) size[0] = data.num_nodes data.num_nodes edge_attr[idx] = data.edge_attr edge_index, edge_attr = remove_self_loops(edge_index, edge_attr) data.edge_attr = edge_attr data.edge_index = edge_index return data
path = osp.join(osp.dirname(osp.realpath(file)), '..', 'mydataset.pt') transform = T.Compose([Complete(),T.Distance(norm=False)]) dataset = mof(path,transform = transform ).shuffle() train_dataset = dataset[2000:] train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
class Net(torch.nn.Module): def init(self): super(Net, self).init() self.lin0 = torch.nn.Linear(100, dim)
nn = Sequential(Linear(1, 128), ReLU(), Linear(128, dim * dim))
self.conv = NNConv(dim, dim, nn, aggr='mean')
self.gru = GRU(dim, dim)
self.set2set = Set2Set(dim, processing_steps=3)
self.lin1 = torch.nn.Linear(2 * dim, dim)
self.lin2 = torch.nn.Linear(dim, 1)
def forward(self, data):
out = F.relu(self.lin0(data.x))
h = out.unsqueeze(0)
for i in range(3):
m = F.relu(self.conv(out, data.edge_index, data.edge_attr))
out, h = self.gru(m.unsqueeze(0), h)
out = out.squeeze(0)
out = self.set2set(out, data.batch)
out = F.relu(self.lin1(out))
out = self.lin2(out)
return out.view(-1)
model = Net() print('Let\'s use', torch.cuda.device_count(), 'GPUs!') model =DataParallel(model)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') model = model.to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for data_list in train_loader:
optimizer.zero_grad()
output = model(data_list)
y = torch.cat([data.y for data in data_list]).to(output.device)
loss = F.nll_loss(output, y)
loss.backward()
optimizer.step()
Note that you need to use DataListLoader
for loading your data when using DataParallel
.
could you give me an example,i don‘t know what do you mean,much botherding,thanks!
thanks,it works,and another question,can i add the bond distance value in PYG edge_attr,i see it is one-hot format,how could i add it.
Can you clarify? torch_geometric.transforms.Distance
should automatically take care of that.
i consider pytorch_geometric.nn.edge_attr is the place to add the distance between the atoms,so i add it in it,but QM9 use one-hot vector to represent it,so if i want to use the true distance in PYG,how could i add it?QM9_NN_CONV.py didn't use pos information ,i also want to use it,how can i fix it,and it doesn't perform well in large atom number graphs,what's the possible reason?Thanks
the task in about Graph Regression,which model should i use?
The QM9 example does use pos information, and it does so by calculating the distances based on source and target nodes, and adding them to edge_attr
, so, e.g. like this (implemented in T.Distance
):
row, col = edge_index
dist = (pos[row] - pos[col]).norm(dim=-1)
edge_attr = torch.cat([edge_attr, dist.unsqueeze(-1)], dim=-1)
Regarding regression and classification, their major difference is their usage of different loss formulations. The model isn't that much affected by that, so you can basically use any model for classification also for regression.
thanks,but one atom may have different nums bond,so how can i add it into edge_attr?
must the the formulations of edge_attr be one-hot?so how can i define the distance one-hot?
It doesn't have to be, but it's more convenient to use if you want to add continuous edge features to the graph
It doesn't have to be, but it's more convenient to use if you want to add continuous edge features to the graph
when I used dataparallel layer in MPNN with 2 gpus, the gpu volatile GPU Util becomes very low at about 44%. What is the problem? At this situation, should I just use 1 gpu without dataparallel to train the model more efficiently?
I think that heavily depends on the batch_size
of your training process. For molecules, using small batch sizes is general a good idea for better training, but it comes at the cost of low GPU utilization. The GPU should be easily be able to fit batch sizes of 512 or 1024.
See https://github.com/rusty1s/pytorch_geometric/blob/master/examples/data_parallel.py#L6
Sir, could you please give me the new link, the old one was broken. I also need to see how to use this way.
All multi GPU examples have been moved to https://github.com/rusty1s/pytorch_geometric/tree/master/examples/multi_gpu (including newly introduced distributed training examples).
All multi GPU examples have been moved to https://github.com/rusty1s/pytorch_geometric/tree/master/examples/multi_gpu (including newly introduced distributed training examples).
ok thank you very much!
Hello, I'm using multi gpu with Davis dataset for DTI prediction. my task is regression task. although my code implements DataParallel () and DataListLoader (), I'm getting the following error: AttributeError: 'tuple' object has no attribute 'num_nodes'
ps: the code works fine on single gpu
Can you show me a minimal example? My guess is that your dataset returns a tuple rather than a single data
object.
I've attached a part of my script , since the whole script is too big. Thanks for your comment
hello,i have two gpus,and i want to train with them with torch.nn.dataparallel. i change the file in here:
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1' device = torch.device('cuda') model = Net() model=torch.nn.DataParallel(model) model.to(device)
but it occurs error:RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1591914880026/work/aten/src/THC/generic/THCTensorMathBlas.cu:270
how can i fix it?thanks for ur reply!