how to apply data augmentation? - Githubissues

nshaud / DeepHyperX

Deep learning toolbox based on PyTorch for hyperspectral data classification.

Other

436 stars 122 forks source link

how to apply data augmentation? #6

Closed jingyao16 closed 4 years ago

jingyao16 commented 4 years ago

how to apply data augmentation?

nshaud commented 4 years ago

See here for the options regarding data augmentation.

jingyao16 commented 4 years ago

Thanks for your reply. Yes, I have seen those options and flags in class HyperX. What awkward is, I am a cookie, don't know how to activate getitem function in a instance object as train_dataset. Could you please kindly show how to get an augmented dataset before sending into dataloader

nshaud commented 4 years ago

@YunyanYao Data augmentation is done on-the-fly when training, I am not sure that I understand correctly what you wish to achieve.

jingyao16 commented 4 years ago

@nshaud First, thanks for your code, I and my colleague shall cite your github link in our submitted paper. Then, let me detail my situation. Previously, we use the MatConvNet to build our deep model, we manually augment training data, apply five transformations to each sample, after which we get a new training dataset with a six times bigger size. I am wondering how can I modify your code to achieve the same purpose. Specifically, after wrap 'train_dataset' by 'DataLoader', and get a new 'train_loader' with bigger size.

nshaud commented 4 years ago

I am wondering how can I modify your code to achieve the same purpose. Specifically, after wrap 'train_dataset' by 'DataLoader', and get a new 'train_loader' with bigger size. So you actually want to perform an offline data augmentation. This is not the typical usecase for our code but you can just cherry-pick the functions that you need.

For example (not actual code but it should give you the idea):

from datasets import HyperX

my_dataset = HyperX(my_data, my_gt, radiation_augmentation=True, mixture_augmentation=True, ...)
n_samples = len(my_dataset)
augmented_dataset = []
for _ in range(6 * n_samples):
    augmented_dataset.append(my_dataset[i % n_samples].numpy())

# augmented_dataset now has 6 times more samples than the original dataset after data augmentation
...