GPU utility extremely low

czy97 commented 3 years ago

When I run the callhome recipe using the default config file, the GPU utility is extremely low (less than 10%). Is this normal?

czy97 commented 3 years ago

When I run the callhome recipe using the default config file, the GPU utility is extremely low (less than 10%). Is this normal? Could this be caused by the pit loss calculation? I found that the code calculates pit loss serially.

nttcslab-sp-admin commented 3 years ago

It can be related to the PIT loss calculation, but probably not. The current setting (conf/train.yaml) assumes that a user uses a relatively old/weak GPU with a small memory size. If you are using a GPU with large memory, you can decrease batchsize_per_gpu (in conf/train.yaml) and simultaneously increase batchsize (in conf/train.yaml) to put the training data as much as you can onto a GPU, while always making sure that batchsize * batchsize_per_gpu remains 1024, e.g. batchsize:512, batchsize_per_gpu:2. By doing so, you should be able to increase the GPU utility.

czy97 commented 3 years ago

Thanks for the comment. By the way, can you upload the log of Callhome recipe if possible. I can't reproduce the results you listed and I want to find the reasons.

czy97 commented 3 years ago

In addtion, I find that the spk loss and pit loss calculation do influence the training speed a lot. I update the calculation here. You can check it.

nttcslab-sp-admin commented 3 years ago

Thanks for the feedback. We uploaded the log files in https://github.com/nttcslab-sp/EEND-vector-clustering/blob/main/egs/callhome/v1/Log.tar.gz

czy97 commented 3 years ago

Sorry for bothering again, I find my reproduction can achieve similar results with yours when the speaker number is small. The results get worse when there is more speakers. Can you give me some advice. Thanks.

Spk# | Spk2 | Spk3 | Spk4 | Spk5 | Spk6 | Spk_all Yours | 7.96 | 11.93 | 16.38 | 21.21 | 23.10 | 12.49 Mine | 7.96 | 12.69 | 19.45 | 29.01 | 32.34 | 14.13

kli017 commented 2 years ago

hello can you upload the loss of mini_Librispeech?

nttcslab-sp-admin commented 2 years ago

Hi, kli017! We have an excerpt of validation loss transition for mini_librispeech in https://github.com/nttcslab-sp/EEND-vector-clustering/blob/main/egs/mini_librispeech/v1/RESULT.md. Is this sufficient for your purpose? Or, you need an entire log for the training? We may need a couple of days to reproduce the log (since we first need to restore the experimental condition we used before).

kli017 commented 2 years ago

@nttcslab-sp-admin Hi Thanks for the quick reply! I check the train log in the RESULT.md and found that the Mean Loss of mine is much higher than yours. I trained the model for 10 epochs and the loss just decrease from 0.6628450117613139 to 0.6555626417461194. And hte DER for nspk0 and nspk1 are 48.71 and 52.89 respecitvely. Have you changed any parameters or something in the recipe?

train.log.txt

nttcslab-sp-admin commented 2 years ago

Hi, @kli017! It looks like you are using 8GPUs. Could you try to use only one GPU, i.e., CUDA_VISIBLE_DEVICES=0 and rerun the recipe? With that much GPUs (that actually changes effective batchsize, etc.), our preset hyper-parameters are simply far off optimal values, I guess.

nttcslab-sp-admin commented 2 years ago

@czy97 We previously suggested to change the batchsize to speed up your training, such that batchsize batchsize_per_gpu remains 1024 (more strictly, batchsize batchsize_per_gpu * num_GPUs remains 1024), but it turned out that we cannot reproduce the same/similar results in that way with e.g. batchsize:512, batchsize_per_gpu:2. We sometimes got a very bad result as you did. We are looking into this issue.

In the meanwhile, we found that if you change the chunk_size: 150 to chunk_size: 500 in conf/train.yaml and use 1 GPU for the training, you can speed up the training, and obtain OK-ish result (something like 12.98% for unknown Spk# conditions). But this report is not the final one. We'll get back to you once we find what really is the problem there and the solution for it.

czy97 commented 2 years ago

Thanks for the reply. Looking forward to the final solution.

kli017 commented 2 years ago

@nttcslab-sp-admin yes I was training with 8GPUs. So current code does not support multi GPU training? I also tried 2 speaker without overlap for 100 epochs. The Mean loss is around 0.46 and DER is 45.19. I ran the inference on dev set and cut the audio according to the rttm and found the reulst really bad, in some result even only got 1 speaker. I checked the EEND https://github.com/hitachi-speech/EEND/issues/4 and they said they never validate on data less than 100hours. So, I was confusing what lead to the problem. multi GPUs or the training set was too small.

nttcslab-sp-admin commented 2 years ago

@kli017 Well, the code supports multi-GPU training in a way that it does not crash. But if you use N GPUs, the actual batchsize the program use is going to be N times bigger. And in that case, our preset hyper-parameters such as batchsize, warmup steps etc will not be optimal anymore, and you will get quite bad results sometimes. Our current preset hyper-parameters assumes 1GPU training. This issue is actually related to the issue czy97 raised. We're trying to find a solution and good hyper-parameter settings for the case you use large batchsize and multiple GPUs (to speed up training), based on the current code.

kli017 commented 2 years ago

Understood. Thanks, I'll try with 1GPU first and looking forward the solution!

nttcslab-sp-admin commented 2 years ago

We tries several multi-GPU configurations but could not find a good training configuration that can closely match or surpass our 1-GPU training results. However, since the purpose of this repository is to reproduce the CALLHOME results of our paper (which we can do with 1-GPU training in this repo), let me close this issue.

Jamiroquai88 commented 1 year ago

Hello, do you have some recommendations on how to speed up the training on a single GPU? Trying with larger chunk_size now @nttcslab-sp-admin

nttcslab-sp / EEND-vector-clustering

GPU utility extremely low #1