liutinglt / CE2P

214 stars 41 forks source link

GPU memory usage keep increasing #8

Open zed0630 opened 5 years ago

zed0630 commented 5 years ago

I am trying to reproduce the experiments result, but whenever I run the script, the gpu memory usage keeps increasing and the program frozen once it runs out of gpu memory after several iterations. I have tried pytorch of different version, both 0.3.1 and 0.4.1 and the problem remained. The cuda version is 9.0 and the only modification I made is resizing the images to 473x473 in dataset.py due to https://github.com/liutinglt/CE2P/issues/5. What is the possible reason that cause this strange behavior? Could you give the detailed settings of your experiments?

liutinglt commented 5 years ago

@zed0630 Sorry, there is a bug hasn't been fixed. Now, you can use the command ‘nvidia-smi topo -m’ to find the GPUDirect communication matrix like following. As our GPU 0-4 and 5-9 have the same connection PIX, we use the 0-4 or 5-9 for training. You can check on your own machine, and make sure using the GPU with the same connection.

image

zed0630 commented 5 years ago

@liutinglt Thank you. Now I can run the code with 4 GPUs, but I have to modify the batch size to 32 due to the limitation of gpu memory.

rxqy commented 5 years ago

Hi @zed0630 , I followed your advice and resize the images and labels to (473,473) in dataloader. How about the performance? Can you achieve a similar result when you set batch to 32?

zed0630 commented 5 years ago

Hi, @rxqy . I think I have reproduced a comparable result by following the instructions in the repo. There are only two differences in my experiments. The batch size is reduced to 32 due to limited gpu memory, and I generate the edge label using my own implementation rather than download the files in the repo.

rxqy commented 5 years ago

@zed0630 so you generated edge labels with the same shape as the original image? Many thanks!

zed0630 commented 5 years ago

@rxqy You're right. Edge labels have the same shape with original images, and I resized those labels to 473x473 in training.