Closed leconteur closed 7 years ago
It depends on the model. With dynamic computation graphs, it's easy for single-threaded python code to become the bottleneck, and fail to keep a large GPU fed. Fold does support multi-processing to help with this -- make sure you have it turned on. Depending on the order of operations, the big matrix-multiplies may also be waiting for data from a bunch of smaller copies/adds/etc. that have poor utilization. Another bottleneck is the protocol buffer deserializer, which is also CPU bound; I'm afraid I don't have a good workaround for that yet.
On Tue, May 23, 2017 at 8:16 AM, Olivier Gagnon notifications@github.com wrote:
When running fold cide on a K40 card, I'm seeing a 10% utilization in nvidia-smi. This behavior is also observed in the language id example. Is this kind of number to be expected or is there a problem in my installation?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/fold/issues/64, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGbTZg_oeBP2SAVgDhaj26SbLONrPMxks5r8vhXgaJpZM4Nj2b1 .
-- DeLesley Hutchins | Software Engineer | delesley@google.com | 505-206-0315
I found that I hadn't increased the size of my model size enough after the initial development phase. Thank you for your reply.
When running fold cide on a K40 card, I'm seeing a 10% utilization in nvidia-smi. This behavior is also observed in the language id example. Is this kind of number to be expected or is there a problem in my installation?