Open kosuke55 opened 4 years ago
/media/kosuke/SANDISK/hanging_points_net/checkpoints/gray/hpnet_latestmodel_20200812_2224.pt
ネットワークの出力自体がnan
ipdb> hp_data
tensor([[[[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
[[0.3686, 0.3686, 0.3765, ..., 0.4275, 0.4275, 0.4275],
[0.3686, 0.3686, 0.3765, ..., 0.4314, 0.4314, 0.4314],
[0.3725, 0.3725, 0.3804, ..., 0.4392, 0.4392, 0.4392],
...,
[0.4000, 0.4000, 0.3961, ..., 0.4196, 0.4196, 0.4196],
[0.4039, 0.4039, 0.4000, ..., 0.4196, 0.4196, 0.4196],
[0.4039, 0.4039, 0.4000, ..., 0.4196, 0.4196, 0.4196]]]],
device='cuda:0')
ipdb> self.model(hp_data)
(tensor([[[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]]], device='cuda:0',
grad_fn=<ReluBackward1>), tensor([[nan, nan, nan, nan, nan]], device='cuda:0', grad_fn=<AddmmBackward>))
ひとつ前のmodelとoptimizerを保存する?? https://qiita.com/syoamakase/items/a9b3146e09f9fcafbb66
if torch.isnan(loss):
print('loss is nan!!')
self.model = self.prev_model
self.optimizer = torch.optim.Adam(
self.prev_model.parameters(), lr=args.lr, betas=(0.9, 0.999),
eps=1e-10, weight_decay=0, amsgrad=False)
self.optimizer.load_state_dict(
self.prev_optimizer.state_dict())
continue
else:
self.prev_model = copy.deepcopy(self.model)
self.prev_optimizer = copy.deepcopy(self.optimizer)
どこかの勾配が大きくなっている。これで大きいところをクリップする https://pytorch.org/docs/master/generated/torch.nn.utils.clip_grad_norm_.html
torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2)
https://discuss.pytorch.org/t/how-to-check-norm-of-gradients/13795/2 より
Q: How do we choose the hyperparameter c? A: We can train our neural networks for some epochs and look at the statistics of the gradient norms. The average value of gradient norms is a good initial trial.
clipは数エポックのgradient normの平均にするとよい。
求め方 https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48
In [76]: for p in self.model.parameters():
...: param_norm = p.grad.data.norm(2)
...: total_norm += param_norm.item() ** 2
...: total_norm = total_norm ** (1. / 2)
clip_grad_normしてもnanになってしまう。。 https://github.com/kosuke55/hanging_points_cnn/issues/1#issuecomment-673379924 でバッチサイズを下げれば(64->16)とりあえずはnanで学習が止まらることはない。
v_predが小さくなっている?
In [122]: v_pred = torch.Tensor([1e-30, 0, 0])
...: print(torch.norm(v_pred))
...: v_pred_n = v_pred / torch.norm(v_pred )
...: print(v_pred_n)
...:
tensor(0.)
tensor([inf, nan, nan])