Open hugovbraun opened 4 years ago
Mm, increasing the level at which the sync is done would invalidate the diagnostics that are currently done every 10 iterations in nnet3 training. Not trivial to fix because if we make it --verbose=2 we'd have to see if there are any super-verbose things happening at verbose level=2. Now, it could be that not so many people look at those statistics.
Is there stuff in GPU decoding that you need to debug with verbose=1?
This is mostly for optional output - things that are useful to print sometimes. Currently we do it using KALDI_LOGs that we comment/uncomment and recompile. Verbose levels could help. However that's just for comfort, so if it's complicated we can just leave it the way it is.
Using --verbose=1 (or up) turns on some profiling. If a GPU is used, it will add a host sync after each nnet3 op: https://github.com/kaldi-asr/kaldi/blob/ff4cb55a977c443c2407091a6c17921395600994/src/cudamatrix/cu-device.cc#L497
Adding that many syncs will reduce the performance drastically. I understand the logic behind that sync (we want to measure the time spent by the GPU on that task), however a profiling tool that changes the profile too much is not ideal.
Would it be possible to either: