thomashopkins32 / HuBMAP

Hacking the Human Vasculature (Kaggle Competition)
Apache License 2.0
0 stars 0 forks source link

Inspect memory requirements of UNet architecture #2

Closed thomashopkins32 closed 1 year ago

thomashopkins32 commented 1 year ago

The number of batches that I can fit on my GPU (8 GB memory) is only 4 at the moment.

Analysis of the GPU memory requirements will make it easier to determine the best method to train the model with. A good way to go about this would be to call get_model_gpu_memory after each layer in the network. I need to know how the memory requirements change throughout a forward pass.

I should also look into other methods (or packages) that can do this work for me.

thomashopkins32 commented 1 year ago

There exists torch.cuda.max_memory_allocated(device=None). We should test this on different batch sizes and add this as a debug option.

thomashopkins32 commented 1 year ago

This issue can be marked resolved when I create a function that takes a model and example input and outputs the maximum memory usage over an epoch of training.

I'm thinking I should write train_one_epoch or train(epochs=1) first though.

thomashopkins32 commented 1 year ago

Wrote the training and validation scripts. Now I need to figure out how to get the GPU memory throughout execution.

thomashopkins32 commented 1 year ago

Normal training we can use a batch size of 4 on my 3070 GPU.

Starting memory: 0.0
After model sent to cuda: 126.178304
Step 0
After forward pass: 5616.571903999999
Memory used by forward pass: 5490.393599999999
After backward pass: 252.036608
After optimizer step: 505.062912
Step 1
After forward pass: 5995.456
Memory used by forward pass: 5490.393088
After backward pass: 505.062912
After optimizer step: 505.062912
Step 2
After forward pass: 5995.456
Memory used by forward pass: 5490.393088
After backward pass: 505.062912
After optimizer step: 505.062912

With torch.cuda.amp.autocast and torch.cuda.amp.GradScaler we can use a batch size of 8 on my 3070 GPU.

Starting memory: 0.0
After model sent to cuda: 126.178304
Step 0
After forward pass: 5931.460096
Memory used by forward pass: 5805.281792
After backward pass: 253.741568
After optimizer step: 253.74208
Step 1
After forward pass: 6060.071424
Memory used by forward pass: 5806.329855999999
After backward pass: 253.741568
After optimizer step: 253.74208
Step 2
After forward pass: 6060.071424
Memory used by forward pass: 5806.329855999999
After backward pass: 253.741568
After optimizer step: 253.74208

All numbers are in MB.

thomashopkins32 commented 1 year ago

For future reference: