Open crea397 opened 4 years ago
Do you know where the segmentation fault occurs?
I think I'm getting Segmentation fault (core dumped) when I run pred = model(data).
Yes, but do you know which operation in model(data)
produces this error?
@rusty1s I think the following are causing Segmentation fault. pointnet2_classification.py line 24 in SAModule
x = self.conv(x, (pos, pos[idx]), edge_index)
When I comment this out, I don't have Segmentation fault, but when I uncomment it, I have Segmentation fault.
Does that also happen when running on CPU? Whats the shape of pos
, pos[idx]
and the output of edge_index[0].max()
, edge_index[1].max()
?
I used device = torch.device('cpu')
instead of device = torch.device('cuda' if torch.cuda.is_ available() else 'cpu')
.
When I changed, Segmentation fault did not occur.
I checked the output for the same point cloud data in Nano and Xavier NX.
Device | Nano | Xavier NX |
---|---|---|
pos | torch.Size([100, 3]) | torch.Size([100, 3]) |
pos[idx] | torch.Size([50, 3]) | torch.Size(50, 3]) |
edge_index[0].max() | tensor(99, device='cuda:0') | tensor(99, device='cuda:0') |
edge_index[1].max() | tensor(49, device='cuda:0') | tensor(49, device='cuda:0') |
Can you do me a favor and test if scatter_max
works on GPU on Xavier NX?
I ran the following code, referring to torch-scatter.
Code
import torch
from torch_scatter import scatter_max
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
src = torch.tensor([[2, 0, 1, 4, 3], [0, 2, 1, 3, 4]], device=device)
index = torch.tensor([[4, 5, 4, 2, 3], [0, 0, 2, 2, 1]], device=device)
out, argmax = scatter_max(src, index, dim=-1)
print(out)
print(argmax)
Result
tensor([[0, 0, 4, 3, 2, 0],
[2, 4, 3, 0, 0, 0]], device='cuda:0')
tensor([[5, 5, 3, 4, 0, 1],
[1, 4, 3, 5, 5, 5]], device='cuda:0')
i got the same error when i try to do inference in a jetson AGX Xavier. The code line that explode the segmentation core dumped is:
self.cfx = cuda.Device(0).make_context()
I'm trying to do inference using the nvidia Tensorrt 7.1. It's weird because when i use the optimization engine in a unique script, it works, but when i use gRPC for create a microservice, explode the segmentation core dumped error.
Can you do me a favor and test by re-installing torch-scatter
and torch-sparse
with the latest released wheels (uploaded yesterday). There were some changes that allow to support a larger variety of compute capabilities.
@Dave0995 Was any progress ever made on this?
This exact issue is showing for me, still.
self.cfx = cuda.Device(0).make_context()
works in a stand alone script.
But it throws a segfault when another process is introduced.
❓ Questions & Help
Hi,
I implemented the program with reference to
examples/pointnet2_classification.py
and used Google Colablatory's GPU to learn the model.I save the model that I learned in Colab and try to call that model in Jetson Xavier NX to make an inference, I get the Segmentation fault (core dumped).
I ran the same code on the Jetson Nano, but in that case Segmentation fault (core dumped) did not occur.
I think I'm getting Segmentation fault (core dumped) when I run test().
How do I solve this problem?
Thanks!
Enviroment