tensorflow / fold

Deep learning with dynamic computation graphs in TensorFlow
Apache License 2.0
1.82k stars 266 forks source link

GPU utilisation around 10% #64

Closed leconteur closed 7 years ago

leconteur commented 7 years ago

When running fold cide on a K40 card, I'm seeing a 10% utilization in nvidia-smi. This behavior is also observed in the language id example. Is this kind of number to be expected or is there a problem in my installation?

delesley commented 7 years ago

It depends on the model. With dynamic computation graphs, it's easy for single-threaded python code to become the bottleneck, and fail to keep a large GPU fed. Fold does support multi-processing to help with this -- make sure you have it turned on. Depending on the order of operations, the big matrix-multiplies may also be waiting for data from a bunch of smaller copies/adds/etc. that have poor utilization. Another bottleneck is the protocol buffer deserializer, which is also CPU bound; I'm afraid I don't have a good workaround for that yet.

On Tue, May 23, 2017 at 8:16 AM, Olivier Gagnon notifications@github.com wrote:

When running fold cide on a K40 card, I'm seeing a 10% utilization in nvidia-smi. This behavior is also observed in the language id example. Is this kind of number to be expected or is there a problem in my installation?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/fold/issues/64, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGbTZg_oeBP2SAVgDhaj26SbLONrPMxks5r8vhXgaJpZM4Nj2b1 .

-- DeLesley Hutchins | Software Engineer | delesley@google.com | 505-206-0315

leconteur commented 7 years ago

I found that I hadn't increased the size of my model size enough after the initial development phase. Thank you for your reply.