jiaxu-Zhu / DETR

Mindspore project of DETR
2 stars 0 forks source link

关于梯度爆炸的问题 #1

Open ChaselLau666 opened 6 months ago

ChaselLau666 commented 6 months ago

感谢你做的出色的工作 我在使用代码进行训练的时候发生了梯度爆炸的问题 这是我的日志! epoch: 1 step: 1, loss is 9.914839744567871 epoch: 1 step: 2, loss is 11.918198585510254 epoch: 1 step: 3, loss is 10.370205350685865 epoch: 1 step: 4, loss is 12.24103601814176 epoch: 1 step: 5, loss is 9.714082717895508 epoch: 1 step: 6, loss is 9.692423820495605 Traceback (most recent call last): File "train.py", line 117, in train(parse_args()) File "train.py", line 113, in train model.train(args.epoch, dataset, callbacks=[ckpoint, LossMonitor()], dataset_sink_mode=False) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/train/model.py", line 1061, in train self._train(epoch, File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/train/model.py", line 113, in wrapper func(self, *args, kwargs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/train/model.py", line 613, in _train self._train_process(epoch, train_dataset, list_callback, cb_params, initial_epoch, valid_infos) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/train/model.py", line 914, in _train_process outputs = self._train_network(next_element) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 662, in call raise err File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 658, in call output = self._run_construct(args, kwargs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 442, in _run_construct output = self.construct(cast_inputs, kwargs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 422, in construct return self._no_sens_impl(inputs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 437, in _no_sens_impl loss = self.network(inputs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 662, in call raise err File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 658, in call output = self._run_construct(args, kwargs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 442, in _run_construct output = self.construct(*cast_inputs, kwargs) File "/home/xiangchengliu/Downloads/DETR/src/models/loss.py", line 59, in construct loss = self._loss_fn(output, tgt) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 662, in call raise err File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 658, in call output = self._run_construct(args, kwargs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 442, in _run_construct output = self.construct(*cast_inputs, *kwargs) File "/home/xiangchengliu/Downloads/DETR/src/models/matcher.py", line 586, in construct indices = self.matcher(outputs_without_aux, targets) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 662, in call raise err File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 658, in call output = self._run_construct(args, kwargs) File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/mindspore/nn/cell.py", line 442, in _run_construct output = self.construct(cast_inputs, kwargs) File "/home/xiangchengliu/Downloads/DETR/src/models/matcher.py", line 233, in construct indices = [linear_sum_assignment(c[i]) for i, c in enumerate(c_split)] File "/home/xiangchengliu/Downloads/DETR/src/models/matcher.py", line 233, in indices = [linear_sum_assignment(c[i]) for i, c in enumerate(c_split)] File "/home/xiangchengliu/anaconda3/envs/ms/lib/python3.8/site-packages/scipy/optimize/_lsap.py", line 86, in linear_sum_assignment return _lsap_module.calculate_assignment(cost_matrix, maximize) ValueError: matrix contains invalid numeric entries Aborted (core dumped)

farshidrayhancv commented 6 months ago

Same Issue here too. Any update please?