libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.87k stars 179 forks source link

Warning while training model with DDP #177

Open AmmaraRazzaq opened 2 years ago

AmmaraRazzaq commented 2 years ago

Hi I am getting the following warning when training the model with ffcv dataloader + ddp.

[W reducer.cpp:362] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.

The same code works fine with pytorch dataloader + ddp

AmmaraRazzaq commented 2 years ago

I think this error was occurring because I was not putting the tensors on gpu in the image and label pipeline, instead I was putting them on gpu in the train and val loop. However now, only the image tensors are going on gpu, label tensors are not moving to gpu.

loaders = {}
for name in ['train', 'val']:
    label_pipeline: List[Operation] = [NDArrayDecoder(), ToDevice(ch.device('cuda:0'))] 
    image_pipeline: List[Operation] = [SimpleRGBImageDecoder(), Normalize(), ToTensor(), Convert(ch.float32), ToDevice(ch.device('cuda:0')), ToTorchImage()] 
    # Create loaders
    loaders[name] = Loader(
        paths[f'{name}_beton_path'],
        batch_size=14,
        num_workers=6,
        order=OrderOption.RANDOM if name == 'train' else OrderOption.SEQUENTIAL,
        # distributed = (name == 'train'),
        # seed= 0,
        drop_last = (name == 'train'),
        pipelines={
            'image': image_pipeline,
            'label': label_pipeline
        }
    )
AmmaraRazzaq commented 2 years ago

Resolved: Pytorch dataset class should be given array as an input and I was giving a list for labels. Even though NDArrayField and NDArrayDecoder() were working fine. No further changes could be done to labels after decoding.

sachitkuhar commented 2 years ago

Hi @AmmaraRazzaq I am facing the same error. I could not understand your last comment. Do you mind sharing it in a bit more detail? Thanks!

AmmaraRazzaq commented 2 years ago

Even after successfully moving the tensors to GPU, the Warning still persists,

[W reducer.cpp:362] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [2304, 576, 1, 1], strides() = [576, 1, 576, 576] bucket_view.sizes() = [2304, 576, 1, 1], strides() = [576, 1, 1, 1] (function operator())

AmmaraRazzaq commented 2 years ago

Finally figured it out. This warning occurs because ToTorchImage() class returns tensor in channels_last memory format. If the input tensor to a model is in channals_last memory format then the model should support this format else it will give the warning about grad strides not matching. Model can be converted to channels last as follows model = model.to(memory_format=torch.channels_last) as explained here in detail. OR channels_last parameter can be set to False in ToTorchImage(channels_last=False) and it will return the tensor in contiguous memory format and there is no need to convert the model to channels last memory format.

I have found that contiguous memory format is much faster. channels_last memory format is making the training slower than with pytorch data loader.

GuillaumeLeclerc commented 2 years ago

@AmmaraRazzaq What GPU are you using. Newer GPUs should be at least 10% faster with channel_last

AmmaraRazzaq commented 2 years ago

Hi @GuillaumeLeclerc I am using Tesla V100-SXM2-32GB

GuillaumeLeclerc commented 2 years ago

I have a V100 handy, do you mind sharing a sample of your code that is faster with channel_last=false so I can investigate ?

AmmaraRazzaq commented 2 years ago

Hi @GuillaumeLeclerc Thankyou for offering to help. Here is the link to the code https://github.com/AmmaraRazzaq/image_classification/blob/main/sample_code.py

GuillaumeLeclerc commented 2 years ago

Sorry for the delay, can you give me exactly the parameters you are using (and which dataset). Thank you!

AmmaraRazzaq commented 2 years ago

Hi @GuillaumeLeclerc I can't share much detail with you as this is a research project which is still in development phase and has not been made opensource yet. Please let me know if parameters, nature of the data set or model architecture can affect the speed of model training?

GuillaumeLeclerc commented 2 years ago

There are many very important factors including:

AmmaraRazzaq commented 2 years ago

Hi @GuillaumeLeclerc Apologies for late reply.

I am sharing the dataset files and sample code. I am working with CheXpert dataset and beton file size is 165GB for all the images so I have created a beton file with 1000 images (~1.5GB). Images are resized to 512x512 and normalized in the range [-1,1] and are written to beton file in 'raw' format. It's a multilabel classification problem with 5 labels for each image.

Dataset files: https://github.com/AmmaraRazzaq/image_classification/tree/master/betonfiles code: https://github.com/AmmaraRazzaq/image_classification/blob/master/pyfiles/sample_code.py

I am using resnet101 architecture with lr=2e-3, bs=24, gpus=4 (ddp training), SGD optimizers with weight_decay=0, momentum=0.9 and num_workers=6 in the dataloader.