Open NiaJ3oE2LM opened 10 months ago
Hey, can you clarify on the cat(dim=...)
issue? What are the final shapes of the walks
list? Do you have a small reproducible example that explains this behavior you observe?
Hello, yes, I can upload a synthetic example but it will take me some time. I am also publishing the project where I first encountered this behavior in a few days
It would be somewhat easier for me to just be able to reproduce this on a small example. If that's possible, I would appreciate your effort :)
@NiaJ3oE2LM I was also facing the same issue. I decreased the learning rate. It is working fine now. However the dataset was different. You can try that.
🐛 Describe the bug
When working with a graph classification task, I experienced 'nan' values on the loss computation from the standard
node2vec
model loader. The random walks sampling method was returning an elongated vector that screwed theloss
computation of the model.I am using DHFR graph collection loaded with
TUDataset
class. To solve this, I had to modify module nn.models.node2vec.py and set dim=1 (instead of dim=0) on the returning torch.cat tensor from methodspos_sample
andneg_sample
(currently lines 120 and 134).Since these lines of node2vec.py are quite old, I fear this behavior is explained by my error in feeding the data to the loader: if this is the case I probably missed some reading in the documentation and I would be grateful if you could point me in the right direction.
Possibly related discussion: #1437
Versions
PyTorch version: 2.1.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A
GCC version: (GCC) 13.2.1 20230801 Clang version: 16.0.6 CMake version: version 3.28.1 Libc version: glibc-2.38
Python version: 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801] (64-bit runtime) Python platform: Linux-6.7.0-arch3-1-x86_64-with-glibc2.38 Is CUDA available: True CUDA runtime version: 12.3.103 CUDA_MODULE_LOADING set to: LAZY ... Nvidia driver version: 545.29.06 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.9.7 /usr/lib/libcudnn_adv_infer.so.8.9.7 /usr/lib/libcudnn_adv_train.so.8.9.7 /usr/lib/libcudnn_cnn_infer.so.8.9.7 /usr/lib/libcudnn_cnn_train.so.8.9.7 /usr/lib/libcudnn_ops_infer.so.8.9.7 /usr/lib/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True ... Versions of relevant libraries: [pip3] numpy==1.26.2 [pip3] torch==2.1.1 [pip3] torch-cluster==1.6.3+pt21cu121 [pip3] torch_geometric==2.4.0 [pip3] triton==2.1.0 [conda] Could not collect