jmiller656 / EDSR-Tensorflow

Tensorflow implementation of Enhanced Deep Residual Networks for Single Image Super-Resolution
MIT License
330 stars 107 forks source link

GPU-Util is zero #30

Open purse1996 opened 6 years ago

purse1996 commented 6 years ago

Thank you for your sharing code. I met a problem. When I train the model for my data in GPU,the memory-usage is high but the GPU-util is always zero. So training speed is slow. It confuse me very much.

jmiller656 commented 5 years ago

Not sure what this means, I haven't experienced this error. Could you give me some steps to reproduce the error?

xinqingbuhao commented 5 years ago

image i got the same issue plus resource-exhausted.

xinqingbuhao commented 5 years ago

Not sure what this means, I haven't experienced this error. Could you give me some steps to reproduce the error?

11%|███▍ | 1058/10000 [9:15:04<77:40:33, 31.27s/it]

Is this state normal with GPU ? I think this speed is a bit slow and GPU-Util is zero with 10000 img of 100*100, batchsize=32 ,32layer Net. thx~~

jmiller656 commented 5 years ago

Hey there, I believe you are right that the timing you show is quite slow. I think there may be a problem with your cuda setup, unrelated to the code here. This would probably cause zero GPU utilization. Nothing in this project directly edits what is being used in the gpu. That is done behind the scenes by tensorflow

xinqingbuhao commented 5 years ago

Hey there, I believe you are right that the timing you show is quite slow. I think there may be a problem with your cuda setup, unrelated to the code here. This would probably cause zero GPU utilization. Nothing in this project directly edits what is being used in the gpu. That is done behind the scenes by tensorflow

thx~~little brother ,hahah, Could u tell me your tensorflow-version , CUDA and cudnn version? (i use python2.7 env)

xinqingbuhao commented 5 years ago

i solved the issue. The reason is why data is not prepared when GPU need to compute. Now i put the data and label into files, GPU can get data immediately when the training stage. Although your code decouples data processing and training networks. But it caused serial training instead of parallelism, and the main expenses were placed on the cpu. It is possible when the amount of data is small, but it is extremely inefficient when the data is slightly larger.

jmiller656 commented 5 years ago

I see, good catch. Do you have code for this? If so, please make a PR and I'll review it