neurosim / DNN_NeuroSim_V1.3

Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)
62 stars 36 forks source link

error when trainning #6

Closed zhangfeixiang222 closed 3 years ago

zhangfeixiang222 commented 3 years ago

hi, I just had a error when I set inference=1. Due to the torch version, I commented out @ weak script Method, I don't know if it will lead to such a bug? Thanks! Traceback (most recent call last): File "/Ai-Data/home/users/zhangfeixiang/Desktop/DNN_NeuroSim_V2.1-master/Training_pytorch/train.py", line 159, in loss.backward() File "/usr/local/anaconda3/envs/python36/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/anaconda3/envs/python36/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward allow_unreachable=True) # allow_unreachable flag File "/usr/local/anaconda3/envs/python36/lib/python3.6/site-packages/torch/autograd/function.py", line 77, in apply return self._forward_cls.backward(self, *args) File "/Ai-Data/home/users/zhangfeixiang/Desktop/DNN_NeuroSim_V2.1-master/Training_pytorch/utee/wage_quantizer.py", line 417, in backward raise e File "/Ai-Data/home/users/zhangfeixiang/Desktop/DNN_NeuroSim_V2.1-master/Training_pytorch/utee/wage_quantizer.py", line 409, in backward grad_input = QE(grad_output, self.bits_E) File "/Ai-Data/home/users/zhangfeixiang/Desktop/DNN_NeuroSim_V2.1-master/Training_pytorch/utee/wage_quantizer.py", line 52, in QE assert max_entry != 0, "QE blow" AssertionError: QE blow Total Elapse: 5.91, Best Result: 0.000%

Error backward:

tensor(0., device='cuda:0') tensor(0., device='cuda:0')

neurosim commented 3 years ago

This problem is caused by the instability of the WAGE algorithm itself. Adjusting the parameter "grad_scale" may help.