seefun / TorchUtils

(WIP) TorchUtils is a pytorch library with several useful tools and training tricks.
86 stars 15 forks source link

What is the format of the `dataset` in the given example of Data Augmentation? #5

Closed Yihang-Li closed 3 years ago

Yihang-Li commented 3 years ago

In the Data Augmentation example, pls tell that what exactly format is the dataset, as the first parameter of tu.dataset.MixupDataset

train_transform = albumentations.Compose([
    albumentations.Resize(IMAGE_SIZE, IMAGE_SIZE),
    albumentations.HorizontalFlip(p=0.5),
    tu.dataset.randAugment(image_size=IMAGE_SIZE, N=2, M=12, p=0.9, mode='all', cut_out=False),
    albumentations.Normalize(),
    albumentations.Cutout(num_holes=8, max_h_size=IMAGE_SIZE//8, max_w_size=IMAGE_SIZE//8, fill_value=0, p=0.25),
    AT.ToTensorV2(),
    ])

mixup_dataset = tu.dataset.MixupDataset(dataset, alpha=1.0, prob=0.1, mixup_to_cutmix=0.3) 
# 0.07 mixup and 0.03 cutmix

Reminder: image_size=IMAGE_SIZE in tu.dataset.randAugment is redundant.

seefun commented 3 years ago

In the latest version, redundant parameters have been removed.

https://github.com/seefun/TorchUtils/blob/4573472c22fc540ed52ccb2e3e6690f43784472d/torch_utils/dataset/common_aug.py#L7-L16

Format is the dataset in MixupDataset should be like this:

In Dataset class, __getitem__ should return img tensor (after to_tensor, shape c,h,w) and label tensor/array/list (after one-hot or soft label, shape (num_classes,)).

An example: __getitem__ return (img, label) img shape (3,224,224) label is a list [0,1,0,0,0] or after label smoothing [0.02, 0.92, 0.02, 0.02, 0.02]

Finally, thanks for asking. This repo is working in progress. I will add more documents and examples later.

Yihang-Li commented 3 years ago

In the latest version, redundant parameters have been removed.

https://github.com/seefun/TorchUtils/blob/4573472c22fc540ed52ccb2e3e6690f43784472d/torch_utils/dataset/common_aug.py#L7-L16

Format is the dataset in MixupDataset should be like this:

In Dataset class, __getitem__ should return img tensor (after to_tensor, shape c,h,w) and label tensor/array/list (after one-hot or soft label, shape (num_classes,)).

An example: __getitem__ return (img, label) img shape (3,224,224) label is a list [0,1,0,0,0] or after label smoothing [0.02, 0.92, 0.02, 0.02, 0.02]

Finally, thanks for asking. This repo is working in progress. I will add more documents and examples later.

Thanks for your guidance ~

By the way, I was wondering that is it convenient for you to provide a full deep learning script that use this repo ? I really want to learn from it. Thank you again and have a good day~

seefun commented 3 years ago

This is a example using this repo: https://github.com/seefun/TorchUtils/blob/master/examples/kaggle_leaves_classification.ipynb It's a training jupyter notebook for this dataset: https://www.kaggle.com/c/classify-leaves Using this training pipeline, usually, we are able to get great baseline score in Kaggle. Hope this can help you. GLHF~

Yihang-Li commented 3 years ago

This is a example using this repo: https://github.com/seefun/TorchUtils/blob/master/examples/kaggle_leaves_classification.ipynb It's a training jupyter notebook for this dataset: https://www.kaggle.com/c/classify-leaves Using this training pipeline, usually, we are able to get great baseline score in Kaggle. Hope this can help you. GLHF~

Thanks soooo much~