yanwii / seq2seq

基于Pytorch的中文聊天机器人 集成BeamSearch算法
Apache License 2.0
241 stars 87 forks source link

an illegal memory access was encountered #16

Open dingjibang opened 6 years ago

dingjibang commented 6 years ago

项目下下来简单填了几个answer和question然后跑起来测试,发现可以运行并且效果还不错,就搞了将近2mb的answer和question,在preprocessing阶段通过,开始训练的时候就提示下面的错误了。

THCudaCheck FAIL file=C:/new-builder_3/win-wheel/pytorch/aten/src/ATen/native/cuda/Embedding.cu line=247 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
File "seq2seq.py", line 436, in
seq.train()
File "seq2seq.py", line 210, in train
loss, logits = self.step(inputs, targets, self.max_length)
File "seq2seq.py", line 265, in step
loss.backward()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\autograd__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at C:/new-builder_3/win-wheel/pytorch/aten/src/ATen/native/cuda/Embedding.cu:2 47

一脸懵逼,我该怎么办

dingjibang commented 6 years ago

cuda9.0,系统win10,py3.5

yanwii commented 6 years ago

使用CUDA_LAUNCH_BLOCKING=1 python3 seq2seq.py train 可以看到更多信息

dingjibang commented 6 years ago

现在的信息已经是blocking = 1时候的了 顺便贴上blocking = 0的时候的信息

THCudaCheck FAIL file=c:\new-builder_3\win-wheel\pytorch\aten\src\thc\THCReduceAll.cuh line=317 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "seq2seq.py", line 436, in seq.train() File "seq2seq.py", line 210, in train loss, logits = self.step(inputs, targets, self.max_length) File "seq2seq.py", line 266, in step torch.nn.utils.clip_grad_norm(self.encoder.parameters(), clip) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\nn\utils\clip_grad.py", line 51, in clip_grad_norm return clip_gradnorm(parameters, max_norm, norm_type) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\nn\utils\clip_grad.py", line 32, in clip_gradnorm param_norm = p.grad.data.norm(norm_type) RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at c:\new-builder_3\win-wheel\pytorch\aten\src\thc\THCReduceAll.cuh:317

0或者1,报出来的信息都是一样的,没有更多或者更少,就是报错的行数不一样

yanwii commented 6 years ago

可否把数据发我一份?

liutianling commented 5 years ago

@dingjibang 这个问题你解决了吗?我也遇到了这样的问题. @yanwii 但是我关掉GPU后epoch跑到4000,报错确实段错误

ailovejinx commented 1 year ago

你好,请问这个问题你解决了吗,我也遇到了这样的问题