Question for models/trainer.py#L325 ?

zjreno commented 3 years ago

In https://github.com/nlpyang/BertSum/blob/master/src/models/trainer.py#L325 , After sum(), the loss.numel() must be 1 , What different between (loss/loss.numel()).backward() with loss.backward() ?

So, I guess, the loss.numel() may express then_docs ? Can we use loss / normalization replace (loss/loss.numel()) ?

Anothernewcomer commented 2 years ago

Hi I have the same problem, what's your conclusion?

haidequanbu commented 2 years ago

Hi,I have a bug about this statement: Traceback (most recent call last): File "train.py", line 340, in train(args, device_id) File "train.py", line 272, in train trainer.train(train_iter_fct, args.train_steps) File "/root/code/BertSum/src/models/trainer.py", line 155, in train self._gradient_accumulation( File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation loss.div(float(normalization)).backward() File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Does it have any relation with the statement?Or have you solv it? Pardon me for my poor English!

haidequanbu commented 2 years ago

Hi,I have a bug about this statement: Traceback (most recent call last): File "train.py", line 340, in train(args, device_id) File "train.py", line 272, in train trainer.train(train_iter_fct, args.train_steps) File "/root/code/BertSum/src/models/trainer.py", line 155, in train self._gradient_accumulation( File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation loss.div(float(normalization)).backward() File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Does it have any relation with the statement?Or have you solv it? Pardon me for my poor English!

Ok,i have already solved the problem.It is about using BCEcross before,you should give a sigmoid layer before the output.

nlpyang / BertSum

Question for models/trainer.py#L325 ? #114