libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.81k stars 180 forks source link

Unexpected GPU Memory Allocation Issue #210

Closed ByungKwanLee closed 2 years ago

ByungKwanLee commented 2 years ago

I have a problem with unexpected GPU memory allocation issue.

If I run a code for training CIFAR-10 based on FFCV with 5 number of GPU (id: 0,1,2,3,4), four unexpected GPU memories are allocated on id number 0.

image

In addition, if I run the code with 4 number of GPU (id: 0,1,2,3), three unexpected GPU memories are allocated on id number 0.

image

After I debugged it to investigate what the problem is, I found that the unexpected GPU allocation happens in the following line.

for batch_idx, (inputs, outputs) in enumerate(trainloader):

Therefore, I guess that the problem will be attributed to the data loader code due to my modification as below.

However, I cannot find out..

gpu = torch.cuda.current_device()
paths = {
    'train': f'../ffcv_data/{dataset}/{dataset}_train.beton',
    'test': f'../ffcv_data/{dataset}/{dataset}_test.beton'
}

loaders = {}
for name in ['train', 'test']:
    image_pipeline: List[Operation] = [SimpleRGBImageDecoder()]
    label_pipeline: List[Operation] = [IntDecoder(), ToTensor(), ToDevice(f'cuda:{gpu}'), Squeeze()]
    if name == 'train':
        image_pipeline.extend([
            RandomHorizontalFlip(),
            RandomTranslate(padding=int(img_size / 8.), fill=tuple(map(int, mean))),
        ])
    image_pipeline.extend([
        ToTensor(),
        ToDevice(f'cuda:{gpu}', non_blocking=True),
        ToTorchImage(),
        Normalize_and_Convert(torch.float16, True)
    ])

    ordering = OrderOption.RANDOM if name == 'train' else OrderOption.SEQUENTIAL

    loaders[name] = Loader(paths[name], batch_size=train_batch_size if name == 'train' else test_batch_size,
                        num_workers=num_workers, order=ordering, drop_last=(name == 'train'),
                           pipelines={'image': image_pipeline, 'label': label_pipeline})

class Normalize_and_Convert(Operation):
    def __init__(self, target_dtype, target_norm_bool):
        super().__init__()
        self.target_dtype = target_dtype
        self.target_norm_bool = target_norm_bool

    def generate_code(self) -> Callable:
        def convert(inp, dst):
            if self.target_norm_bool:
                inp = inp / 255.0
            return inp.type(self.target_dtype)

        convert.is_parallel = True

        return convert

    def declare_state_and_memory(self, previous_state: State) -> Tuple[State, Optional[AllocationQuery]]:
        return replace(previous_state, dtype=self.target_dtype), None

I wish it is a trivial problem.

ByungKwanLee commented 2 years ago

I changed ffcv the latest version and added argument of gpu_id in torch.load(checkpoint_path, map_location=gpu_id) function.

Then, it is addressed!

afzalxo commented 2 years ago

Hi @ByungKwanLee, I am facing the same issue of unbalanced memory allocation. Which torch.load are you referring you here? I can't find this anywhere inside the authors implementation.

ByungKwanLee commented 2 years ago

First, download ffcv 1.0.0 or 0.4.0 version and copy and paste all files in ffcv folder of the downloaded to the path: /home/$username/anaconda3/envs/$env_name(ex:ffcv)/lib/$python_version/site-packages/ffcv

Second, in my code, I need to load pre-trained weight, thus I use torch.load. When I use it, if I do not point out what gpu is used for the checkpoint parameter by torch.load(checkpoint_path), then the checkpoint parameters are embedded into other gpu id. But, once I use torch.load(checkpoint_path, map_location=gpu_id), then it is solved.

afzalxo commented 2 years ago

Thanks for the prompt response. I'll try this, although I don't have a pretrained model. I'll just update to v1.0.0 and see if it works.