nodefluxio / vortex

A Deep Learning Model Development Framework for Computer Vision
27 stars 6 forks source link

[FEATURE] Add option to disable automatic image padding on dataloader #48

Closed alphinside closed 4 years ago

alphinside commented 4 years ago

Is your feature request related to a problem? Please describe. On issue #43 , DETR training involved their own mechanism to pad the image and building the mask. Somehow having this option means we can follow its mechanism

Describe the solution you'd like Still looking for some ways developer can provide information to 'create_dataset' function

alifahrri commented 4 years ago

do you mean image padding or batch padding? doesn't torch dataloader padding behaviour depends on collate_fn? @alphinside

alphinside commented 4 years ago

image padding, in current implementation default behavior is automatic padding to square, so in collate_fn you didn't need to resize or padding anymore in order to make torch stack of image data

alifahrri commented 4 years ago

doesn't image padding performed by augments/standard_augments in wrapper, not dataloader?

triwahyuu commented 4 years ago

it is in dataset wrapper actually, probably you mean in create_dataloader or create_dataset?

alphinside commented 4 years ago

yes, correct, however note that we also have DALILoader in which the augmentation is not done on the dataset object

alphinside commented 4 years ago

note that in PytorchDataLoader, augmentations is in dataset object, in DALIDataLoader, augmentations is in pipeline

triwahyuu commented 4 years ago

so it's in create_dataloader, right?

alphinside commented 4 years ago

note that in PytorchDataLoader, augmentations is in dataset object, in DALIDataLoader, augmentations is in pipeline

but after looking it more closely, yes we can put the arguments in the dataset object

alphinside commented 4 years ago

so it's in create_dataloader, right?

basically yes, because create_dataset also invoked in create_dataloader and the disable_autopad need to be forwarded to the create_dataset

alifahrri commented 4 years ago

well it think it should be in create_dataset, putting image padding in create_dataloader doesn't seem right

alphinside commented 4 years ago

anyway what have been discussed until this point is implementation details which is quite straightforward, the discussion subject is more about the origin of this disable_auto_pad value. In my proposal I prefer it put on the preprocess_args when the developer develop the model, so the developer fully aware that he is trying to disable the autopad and must develop his own mechanism on the collate_fn cc @alifahrri @triwahyuu

alphinside commented 4 years ago

well it think it should be in create_dataset, putting image padding in create_dataloader doesn't seem right

but how we forward the information to create dataset? considering it is called inside the create_dataloader

triwahyuu commented 4 years ago

I'm fine with your proposal

alifahrri commented 4 years ago

well it think it should be in create_dataset, putting image padding in create_dataloader doesn't seem right

but how we forward the information to create dataset? considering it is called inside the create_dataloader

wait, turns out we have create_dataloader and create_loader

but yeah preprocess_args is visible to dataset

alphinside commented 4 years ago

Actually preprocess_args is not modified by the model_components. So my previous proposal is invalid. So yeah still looking an idea on how to forward information from developer to the create_dataset function

alifahrri commented 4 years ago

it is not returned, but it is mutable btw

alphinside commented 4 years ago

it is not returned, but it is mutable btw

yeah you're correct

triwahyuu commented 4 years ago

so it is still possible right?

alphinside commented 4 years ago

so it is still possible right?

yes possible, but need to add args to the create_dataloader and some modifications on model_components i think. currently there's no way to forward information from developer's side and maintain a good sensible arguments. To achieve this, we need to move out the create_dataset and create_collater out of create_dataloader and from that point we can modify that.

alphinside commented 4 years ago

So i think this is my workaround proposal :

    dataset = create_dataset(dataset_config=dataset_config, 
                             stage=stage, 
                             preprocess_config=preprocess_config,
                             wrapper_format=wrapper_format[dataloader_module])
    if isinstance(collate_fn,str):
        collater_args = {}
        try:
            collater_args = dataloader_config.collater.args
        except:
            collater_args = {}
        collater_args['dataformat'] = dataset.data_format
        collate_fn = create_collater(collate_fn, **collater_args)

        # Re-initialize dataset (Temp workaround)
        if hasattr(collate_fn,'disable_image_auto_padding'):
            dataset = create_dataset(dataset_config=dataset_config, 
                                     stage=stage, 
                                     preprocess_config=preprocess_config,
                                     wrapper_format=wrapper_format[dataloader_module],
                                     disable_image_auto_padding=collate_fn.disable_image_auto_padding)