kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.24k stars 5.32k forks source link

Verbose flag turns on profiling #4211

Open hugovbraun opened 4 years ago

hugovbraun commented 4 years ago

Using --verbose=1 (or up) turns on some profiling. If a GPU is used, it will add a host sync after each nnet3 op: https://github.com/kaldi-asr/kaldi/blob/ff4cb55a977c443c2407091a6c17921395600994/src/cudamatrix/cu-device.cc#L497

Adding that many syncs will reduce the performance drastically. I understand the logic behind that sync (we want to measure the time spent by the GPU on that task), however a profiling tool that changes the profile too much is not ideal.

Would it be possible to either:

  1. Add a special flag to explicitly turn on profiling. We might want to gather more information about the running program with --verbose without seeing a large perf drop
  2. Remove that sync altogether (and use profiling tools such as nsight compute)
  3. If 1. or 2. are not possible, just set it at a higher verbose level? e.g. 2 or 3.
danpovey commented 4 years ago

Mm, increasing the level at which the sync is done would invalidate the diagnostics that are currently done every 10 iterations in nnet3 training. Not trivial to fix because if we make it --verbose=2 we'd have to see if there are any super-verbose things happening at verbose level=2. Now, it could be that not so many people look at those statistics.

Is there stuff in GPU decoding that you need to debug with verbose=1?

hugovbraun commented 4 years ago

This is mostly for optional output - things that are useful to print sometimes. Currently we do it using KALDI_LOGs that we comment/uncomment and recompile. Verbose levels could help. However that's just for comfort, so if it's complicated we can just leave it the way it is.