training is slow and utilize only a few gpu resources

yscacaca / DeepSense

Deepsense: a unified deep learning framework for time-series mobile sensing data processing.

196 stars 71 forks source link

training is slow and utilize only a few gpu resources #5

Closed zhezh closed 7 years ago

zhezh commented 7 years ago

I tried train this model with data provided in readme, I find the training procedure is slow. It already has taken more than 2 hours to just iterate about 700 times. I watch the usage of gpu and cpu, gpu usage is about 10%~20%, sometimes even low to 0%~3%, cpu usage is about 50%, sometimes drop to near 0. CPU usage diagram is as below screenshot from 2017-09-04 18-47-12

My laptop is with intel i7 6700k and gtx 1050ti. I run the code in pycharm with python 2.7 and tensorflow 1.3.

yscacaca commented 7 years ago

I believe there is something not so efficient with the I/O part. You can try reading all training data into the memory if your machine allows. It should boost up the training speed.

zhezh commented 7 years ago

Yes, you are right! I copy all the data from hdd to ssd partition, the training speed increased to more than twice.

I use glances to watch IO usage, the IO is about 4M, still low.

Maybe we can try use tfrecord to put all data into one file and then read it with streaming method to avoid reading a lot of little files if we lack RAM resources.

zhezh commented 7 years ago

@yscacaca Hey, I write a version using tfrecord to boost training speed. If you are interested, see this link

yscacaca commented 7 years ago

Cool, thank you @zhezh !

zhezh commented 7 years ago

@kaixinbuyu Maybe you could debug it to see what is actually passed to this function.

shamanez commented 7 years ago

@kaixinbuyu I also have the same error can you please tell me how did you fix it ?