Open Chen-Cai-OSU opened 2 years ago
device
refers to the device your model is on, while emb_device
refers to the device where the historical embeddings are stored. In general, device=cuda
and emb_device=cpu
. Note that the device
will be automatically set in case you call model.to(device)
.
Thank you for the explanation. What I don't understand is that when I run the following code,
model = GCN(10, 10, 10, 10, 2, device='cpu').to('cuda:3')
print(model)
I got
GCN(
(histories): ModuleList(
(0): History(10, 10, emb_device=cpu, device=cuda:3)
)
(lins): ModuleList()
(convs): ModuleList(
(0): GCNConv(10, 10)
(1): GCNConv(10, 10)
)
(bns): ModuleList(
(0): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
I observe there is a process both on cuda:0
and cuda:3
. (I expect only cuda:3 is used) Does that mean the emb_device is somehow not CPU? I also print out the self.emb = torch.empty(num_embeddings, embedding_dim, device=device, pin_memory=pin_memory)
when History class is initialized and it is indeed the CPU. I just don't how why cuda:0 is used.
I am using torch 1.10.0 + cuda 11.3 + pyg 2.0.4 + python 3.7.13. Let me know if you need more info. Thank you!
Yes, this looks correct to me. Histories will be on CPU while model parameters are on cuda:3
. If there is a process running on cuda:0
, that is definitely a bug I can try to look into. Any pointers highly appreciated.
I don't know what the possible reasons are. I also tried pyg=2.0.4 + torch 1.7.1 + cuda 11.0
and get the same error. To reproduce the error, just add the following line in models/gcn.py
if __name__ == 'main':
model = GCN(10, 10, 10, 10, 5, device='cpu')
print(model)
and run python -m torch_geometric_autoscale.models.gcn
Hello Matthias, Thank you very much for the code. Nice work as always. I was wondering what is the difference between emb_device and device for the History class?
When I initialized a GCN like this
I get
but I noticed that there is a process in cuda:0 (I have multiple gpus), which I don't understand why. Is this desirable behavior? Also, in general, should I always set the device in GCN class as none? I noticed this is what you did in the large_benchmark/main.py.