Closed hengdashi closed 3 years ago
Can you give a little bit more information? The pre-processing should be independent on whether one uses --in-memory
or not. And, as far as I see, del inputs
does not do any harm since outputs are already computed.
Yes, so I was trying to reproduce the RGNN performance on MAG240M dataset, but in the preprocessing stage, I got segmentation fault on the line del inputs
while generating author features.
Since del inputs
is directly after outputs = adj_t.matmul(inputs, reduce='mean').numpy()
, I'm guessing it could be the case that outputs
which is the dot product of adj_t
and inputs
is lazily computed and del inputs
would affect the numbers in outputs
. (of course I could be totally wrong).
I also tried to backtrace the code with gdb, and here's the backtrace log:
FYI, my system config is as follows:
python 3.8.8 torch 1.8.1 torch-geometric main branch numpy 1.19.2
matmul
is performed directly, it isn't lazy op. I think it is save to delete the inputs afterwards. The segmentation fault might have occurred to a different reason.
Any idea what might be the cause of it? Or would you mind share your exact env config so that I could try to see whether it's a bug in the newest version of torch or numpy or not.
I'm not sure why it does not work for you TBH. You said that it works for you using the --in_memory
option, but there is no difference regarding pre-processing in these versions. You can also just use the fully-preprocessed node-feature matrix from here.
My config is:
pytorch-lightning==1.2.0rc1
pytorch==1.7.1
torch-geometric==1.7.0
numpy==1.20.1
Piggybacking on this by noting that I have also gotten seg faults/ core dumps but in this case when trying to actually use the --in-memory
option on the gnn.py
based models SAGE and GAT i.e. if --in-memory
then fault. I will follow up with a trace/dump if I can.
I was wondering whether it might be because of torch>1.7
like @hengdashi and I both have
I may try downgrading my venv, it's just finnicky to get PyG,PyTorch,OGB, and CUDA to all agree so I stopped at first working which was torch==1.8+cu111
env:
pip list
torch 1.8.0+cu111
torch-geometric 1.7.0
torch-scatter 2.0.6
torch-sparse 0.6.9
...
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
...
Python 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21)
[GCC 7.3.0] on linux
>>>
In the for loop below print("generating author features...") in rgnn.py,
del inputs
beforedel outputs
will yield a segmentation fault if the in_memory option is not turned on.