RuntimeError during default execution

AlexanderGri commented 6 years ago

Hello, thank you for your implemenation!

I've just tried to run default experiment with

python main.py --no-cuda --epochs 1

and run into the following problem

/opt/conda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds)
Prepare files
Define model
        Statistics
        Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
  File "main.py", line 321, in <module>
    main()
  File "main.py", line 182, in main
    train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
  File "main.py", line 242, in train
    output = model(g, h, e)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 319, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/grishin/nmp_qc/models/MPNN.py", line 78, in forward
    m = self.m[0].forward(h[t], h_aux, e_aux)
  File "/data/grishin/nmp_qc/MessageFunction.py", line 43, in forward
    return self.m_function(h_v, h_w, e_vw, args)
  File "/data/grishin/nmp_qc/MessageFunction.py", line 175, in m_mpnn
    h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1

Am i doing something wrong? Thank you in advance.

priba commented 6 years ago

Hi,

Sorry for the big delay on the answer, in my opinion the errors you reported come from the Pytorch version. I've got similar errors changing the pytorch release due to changes on the "sum" behaviour. It was in another code I am working on.

https://github.com/pytorch/pytorch/releases "All reduce functions such as sum and mean now default to squeezing the reduced dimension."

I suggest to add keepdim=False in sum operations for fast and easy solve of this problem.

After a few weeks, I will try to fix the code to new pytorch versions.

ay27 commented 6 years ago

I tried to fix the problem and made some improvements, but not confident with the correctness, someone may verify it.

josejimenezluna commented 6 years ago

Hello, @priba

To make things easier, which version of pytorch are we supposed to be running?

adamxyang commented 5 years ago

Hello, @priba

Thanks for the implementation! I encountered the same issue here. I experimented with pytorch versions 0.2.0, 0.3.0 and 1.0.0, and I've also added keepdim=False to all sum operations in datasets.utils.py and models.MPNN.py, but none of them worked.

(rdkit) Adams-MacBook-Pro-4:mpnn iron4dam$ python main.py --no-cuda
Prepare files
Define model
    Statistics
    Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
  File "main.py", line 320, in <module>
    main()
  File "main.py", line 182, in main
    train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
  File "main.py", line 241, in train
    output = model(g, h, e)
  File "/Users/iron4dam/anaconda3/envs/rdkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/models/MPNN.py", line 78, in forward
    m = self.m[0].forward(h[t], h_aux, e_aux)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 43, in forward
    return self.m_function(h_v, h_w, e_vw, args)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 174, in m_mpnn
    h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1. at /Users/soumith/minicondabuild3/conda-bld/pytorch_1512381214802/work/torch/lib/TH/generic/THTensor.c:309

rmrmg commented 5 years ago

@ay27 I've applied your patch and have another problem:

(nmpqc) rmrmg@kolos:/chematica/pka/nmpqc/nmp_qc$ LD_PRELOAD=$CONDA_PREFIX/lib/libstdc++.so python ./main.py --no-cuda
loaeed Prepare files Define model Statistics Create model Optimizer Logger => no best model found at './checkpoint/qm9/mpnn/model_best.pth' Check cuda Traceback (most recent call last): File "./main.py", line 330, in main() File "./main.py", line 191, in main train(train_loader, model, criterion, optimizer, epoch, evaluation, logger) File "./main.py", line 254, in train losses.update(train_loss.data[0], g.size(0)) IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

wmmxk commented 5 years ago

I run into the same error: h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous() RuntimeError: The expanded size of the tensor (24) must match the existing size (73) at non-singleton dimension 1

So if it is due to version update, could I know what version you are using? (I am using pytorch 0.4.1)

priba commented 5 years ago

At that time I was using Pytorch 0.3.0

njwm commented 3 years ago

I made a small change like this： h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

sthakurr commented 2 years ago

I made a small change like this： h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?

njwm commented 2 years ago

Perhaps it is better like this： h_w_rows = h_w[:, None,:].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()

njwm commented 2 years ago

I made a small change like this： h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?

I don't think it makes sense，it just gets past that error.

priba / nmp_qc

RuntimeError during default execution #3