Closed Maqingyang closed 5 years ago
It seems that high cpu occupacy lies in the dataloader, because if I use a constant input_batch instead of dataloader, it won't occupy so much cpu.
input_batch.json contains one pre-stored batch from dataloader.
Maxing out the CPU in general is a good thing, because it means that the input pipeline is efficient. You can reduce the num_workers option to use less processes if this is an issue on your system. But consequently this will reduce the training speed.
Thanks for your advice. I am carefully looking for the cpu exhaustive operation. I found that the crop() in imutils.py is very cpu exhaustive. In fact, if I did't use crop in base_dataset.py, the cpu occupacy will drop to 1/4 as before! That's very strange. I am checking more on that. I don't know what about on your machine.
I located the problem in the imutils.py. In func transfrom():
if invert:
t = np.linalg.inv(t)
This inverse operation is very cpu exhaustive. Could this operation be avoided? Or move this operation to be computed on gpu?
I found a simple solution, which is magically effective.
if invert:
# t = np.linalg.inv(t)
t_torch = torch.from_numpy(t)
t_torch = torch.inverse(t_torch)
t = t_torch.numpy()
If anyone have the same problem, maybe can have a try.
I will look into it. Ii'm surprised that numpy is so slow in this simple task of inverting a 4x4 matrix.
When I ran either train or eval code, the cpu occupacy is very high, and it's very hard to run multi-task on a multi-gpu machine, due to the cpu computing constraint. Did you encounter this problem in your training? Or you could give me some hints about which part of your implementation may be cpu exhaustive. Thank you very much! For example, I show some img about cpu occupaty. My cpu core is intel i9-9900K. For simlilarity, I ran the code of evaluation, i.e. eval.py. one task multi task on a two-gpu machine: