Data type mismatch when trying to update parameters in the NN model

yonatank93 commented 8 months ago

I found a bug, where if I tried to update the NN model parameters using some numpy array and attempted to compute the predictions, I got the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_3132639/3177518836.py in <module>
----> 1 preds_eng(par[129], 129)

/tmp/ipykernel_3132639/1411647374.py in preds_eng(params, idx)
      6     calc.update_model_params(opt_params)
      7     # Compute energy predictions
----> 8     preds_e_tensor = calc.compute(batch)["energy"]
      9     # Convert to numpy
     10     preds_eng = np.array([eng.detach().numpy() for eng in preds_e_tensor])

/data/yonatan/myproject/modules/kliff/kliff/calculators/calculator_torch.py in compute(self, batch)
    182             zeta_stacked.requires_grad_(True)
    183 
--> 184         energy_atom = self.model(zeta_stacked)
    185 
    186         # forces and stress

~/.local/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1188         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190             return forward_call(*input, **kwargs)
   1191         # Do not call functions when jit is used
   1192         full_backward_hooks, non_full_backward_hooks = [], []

/data/yonatan/myproject/modules/kliff/kliff/models/neural_network.py in forward(self, x)
     78         """
     79         for layer in self.layers:
---> 80             x = layer(x)
     81         return x
     82 

~/.local/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1188         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190             return forward_call(*input, **kwargs)
   1191         # Do not call functions when jit is used
   1192         full_backward_hooks, non_full_backward_hooks = [], []

~/.local/lib/python3.7/site-packages/torch/nn/modules/linear.py in forward(self, input)
    112 
    113     def forward(self, input: Tensor) -> Tensor:
--> 114         return F.linear(input, self.weight, self.bias)
    115 
    116     def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 must have the same dtype

I think to fix it, we need to update this line so that the updated parameters have the same dtype as the original parameters. Any thought? I can create a PR about this.

mjwen commented 8 months ago

By default, it is using torch.float32 via torch.Tensor here. Are you using float64?

Yes, a PR would be great!

yonatank93 commented 8 months ago

What I was trying to do was update the weights and biases using a parameter vector written as a numpy array, which I believe defaults to using float64.

mjwen commented 8 months ago

It is a bit strange -- the torch.Tensor used here will actually convert the param to float32.

Given that that function is called here, I thought the original one would work?

I tested it by adding the below block

# sizes, _, _ = calc.get_size_opt_params()
p = np.random.randn(641)
print("@@ flag 1: p.dtype", p.dtype)
calc.update_model_params(p)

after line 199 in example_nn_Si.py, and everything works fine.

So, #141 may not be needed?

Are you doing something different?

yonatank93 commented 8 months ago

There is also a possibility that I got the error because I was accessing lo-level functions in kliff. I will send you the script that I used tomorrow.

yonatank93 commented 8 months ago

@mjwen I think I see what was wrong with the script I used. So, at the beginning of my script, I added the following line; if I dropped this line, then I had no problem at all:

torch.set_default_tensor_type(torch.DoubleTensor)

Additionally, there might be a mismatch in the calculated fingerprints that I exported. I tried adding the line above in example_nn_Si.py, I didn't get any issue.

mjwen commented 8 months ago

Yes, I believe so. Your saved fingerprints and the parameters in the model can be of different data type.

mjwen commented 8 months ago

Closing because of #141

openkim / kliff

Data type mismatch when trying to update parameters in the NN model #139