Getting training.ipynb to run

c-penzo commented 3 years ago

Hi, first of all thank you very much for making your code available!

I am having problems getting the code to work with the Oze Challenge dataset. I use npz_check(Path('datasets'), 'dataset') and it succeeds in producing datasets/dataset.npz only if I use the file ozechallenge_benchmark/labels.json. If I use the file transformer/labels.json I get this error: KeyError: "['initial_temperature'] not in index"

I go on and use datasets/dataset.npz produced with the notebook transformer/training.ipynb with DATASET_PATH = 'datasets/dataset.npz' d_input = 27 d_output = 8 (I did not change d_input and d_output values from the values found in your repo)

I added two prints in the class OzeDataset, so that when I run the cell ozeDataset = OzeDataset(DATASET_PATH) I obtain these shapes for the _x and the _y tensors: _x.shape = torch.Size([7500, 18, 691]) _y.shape = torch.Size([7500, 8, 672])

I had to change the line dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (23000, 1000, 1000)) into dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (5500, 1000, 1000)) to not have an error from random_split

but when I ran the training cell, I get the following error: RuntimeError: size mismatch, m1: [144 x 691], m2: [27 x 64] at /tmp/pip-req-build-as628lz5/aten/src/TH/generic/THTensorMath.cpp:41 (see entire error at the end of this message)

What am I doing wrong? I also tried with d_input = 37 as you suggested in an other post but I have the same error RuntimeError: size mismatch, m1: [144 x 691], m2: [37 x 64] at /tmp/pip-req-build-as628lz5/aten/src/TH/generic/THTensorMath.cpp:41

At the end I am not particularly interested on the Oze dataset, I would just like to be able to run your code to understand what are the input dimensions that it needs to make sure my input is of the same dimensions. So if it is easier for you, it would be sufficient for me to feed to your model some tensors filled with random values and be able to make training.ipynb run.

Thank you for your help! Camilla

RuntimeError Traceback (most recent call last)

in 12 13 # Propagate input ---> 14 netout = net(x.to(device)) 15 16 # Comupte loss /opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 720 result = self._slow_forward(*input, **kwargs) 721 else: --> 722 result = self.forward(*input, **kwargs) 723 for hook in itertools.chain( 724 _global_forward_hooks.values(), ~/WORK/BVP/Transformers/transformer/tst/transformer.py in forward(self, x) 119 120 # Embeddin module --> 121 encoding = self._embedding(x) 122 123 # Add position encoding /opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 720 result = self._slow_forward(*input, **kwargs) 721 else: --> 722 result = self.forward(*input, **kwargs) 723 for hook in itertools.chain( 724 _global_forward_hooks.values(), /opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input) 89 90 def forward(self, input: Tensor) -> Tensor: ---> 91 return F.linear(input, self.weight, self.bias) 92 93 def extra_repr(self) -> str: /opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias) 1674 ret = torch.addmm(bias, input, weight.t()) 1675 else: -> 1676 output = input.matmul(weight.t()) 1677 if bias is not None: 1678 output += bias RuntimeError: size mismatch, m1: [144 x 691], m2: [37 x 64] at /tmp/pip-req-build-as628lz5/aten/src/TH/generic/THTensorMath.cpp:41

maxjcohen commented 3 years ago

Hi, these two repo are not kept perfectly synced, as I have made modifications since the release of the challenge. Most modifications can be handled through the use of the label file and input/output dimensions. For instance, initial_temperature is not part of the challenge.

In the labels.json of the benchmark, X has a dimension of 8, so you should set d_output=8. Z and R have a combined dimension of 37, so d_input=37.

If the error persists, it could be caused by a mix up in axis order, as I switched at some point to match PyTorch's official Transformer implementation, see #16 . In this case, all you need to do is inspect the shape of _x and _y, to check everything match up. rollaxis might be of use.

c-penzo commented 3 years ago

Hi, thanks for your answer. As I said, I did try with d_input=37 and still have the same error. The dimensions are _x.shape = torch.Size([7500, 18, 691]) _y.shape = torch.Size([7500, 8, 672]) Are those what you would expect? Could you say to what dimensions we would have to match up? It is not clear to me

maxjcohen commented 3 years ago

The time dimension should come in second in your case, you can achieve that with the rollaxis function.

Another problem is your time dimension differ between x (691) and y (672, the correct number, 28*24=672 is the number of hours in a month). This problem most likely arose when you added R to Z in order to create _x.

The model expects input and outputs with shape:

_x.shape = torch.Size([7500, 672, 37])
_y.shape = torch.Size([7500, 672, 8])

c-penzo commented 3 years ago

Actually it also needs the line K = Z.shape[-1] from the commit 2c3e30c, along with the lines you suggest in #16

Now training.ipynb runs and I get the right dimensions for _x and _y.

Thanks!!

maxjcohen commented 3 years ago

You're welcome ! The problem comes from the difference between the dataset in the benchmark and this repo, so I don't think we should change the Dataset class here. On the other hand, if anyone is interested in making the modification on the challenge repo's, feel free to write a PR

maxjcohen / transformer

Getting training.ipynb to run #34