cublas runtime error： the GPU program failed to execute

meijieru / crnn.pytorch

Convolutional recurrent network in pytorch

MIT License

2.39k stars 658 forks source link

cublas runtime error： the GPU program failed to execute #28

Closed tlatlbtle closed 7 years ago

tlatlbtle commented 7 years ago

I run crnn_main.py by this command:

python crnn_main.py --trainroot="/home/wangjianbo_i/OCR/data/IIIT5K/traindatalmdb" --valroot="/home/wangjianbo_i/OCR/data/IIIT5K/testdatalmdb" --cuda --alphabet="0123456789abcdefghijklmnopkrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,;:" --imgH=32

And the error message shows below. I tried to run it on the company's server to avoid the environment error, but get the same result. How to solve this problem? :-D 2017-06-07 16-23-38

meijieru commented 7 years ago

It seems to be a problem of PyTorch. You could better ask in the forum of it

tlatlbtle commented 7 years ago

☆⌒(*＾-゜)v THX!!

ahmedmazari-dhatim commented 7 years ago

@wjbKimberly , how run this code with alphabet ={abcdefghijklmnopkrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ} such that the program lowercase all the uppercase chars ?

Thank you

lijian-1994 commented 5 years ago

@wjbKimberly I met the same error.How do you solve it? Thanks!

uzl commented 5 years ago

I fixed this problem by dataset label correction. I mean, training label was incorrect for my dataset. That's why it failed during cost.backward() state.

Please first check your expected label inside cpu_texts variable from cpu_images, cpu_texts = data (line 174 in train.py )

In my case, because of my encoding problem, I had to fix in line: 61 in dataset.py

label = txn.get(label_key.encode())
label = label.decode()