Closed GuanlinLee closed 2 years ago
Hi @GuanlinLee from the provided script I notice that the transform operations are not exactly the same. To be specific, I think on the second script you have additional torchvision.transforms.Resize((32,32))
in the beginning for the training compared to the first script.
Could you try again with the second script updated to something like:
...
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
...
Hi, I think the reason causing this phenomenon is that the data augmentation in data_aug()
, all inputs are applied the same transformation inside the batch. When using the dataloader, the transformation is individually applied to each instance. So, I think the transformation for tensors with batch_size should have a constant behavior, i.e., for each instance the transformation should be random.
Hi @GuanlinLee , I think they are not exactly the same transformation. The first script do the following:
transform=transforms.Compose([transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
])
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_test)
The transforms applied to train image: RandomCrop -> RandomHorizontalFlip -> ToTensor
While on the second script:
transform_test=transforms.Compose([torchvision.transforms.Resize((32,32)),
transforms.ToTensor(),
])
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_test)
...
def data_aug(image):
image = transforms.RandomCrop(32, padding=4).forward(image)
image = transforms.RandomHorizontalFlip().forward(image)
return image
...
x_train, y_train = data_aug(input).cuda(), target.cuda()
The transform applied to train image: Resize -> ToTensor -> RandomCrop -> RandomHorizontalFlip
.
To summarize, the transform applied on the train image are:
RandomCrop -> RandomHorizontalFlip -> ToTensor
Resize -> ToTensor -> RandomCrop -> RandomHorizontalFlip
We have extra Resize
on the second script. And my guess is that this is the reason why they are different.
Hi @YosuaMichael, I know what you mean. However, I have tested it already. Even if I only use ToTensor and then RandomCrop and RandomHorizontalFlip, the results are the same as the 2nd script.
Hi @YosuaMichael, I know what you mean. However, I have tested it already. Even if I only use ToTensor and then RandomCrop and RandomHorizontalFlip, the results are the same as the 2nd script.
I see, so even if you remove resize from second script, the resulting accuracy stay the same as before? In that case I am not too sure what is the root problem and may need further investigation. I will try to reproduce your result first and let you know if I got something.
Hi @GuanlinLee , I tried to reproduce the problem using your code. However seems like the script you provide can't be run.
I encounter several errors like variable not defined, import error, etc.
Could you help me by providing a minimal sample script that can be run without error? For instance I think the PGD function can be removed for the minimal example.
Hi @YosuaMichael, I have uploaded my code to github. You can find it in https://github.com/GuanlinLee/PGD_Demo. If you meet any problem when running it, please let me know. Thanks!
@GuanlinLee Your repo is several hundreds lines which is far to long for us to investigate. This is why Yosua asked you to provide a minimal example that reproduces the problem. Ideally this should be only a few lines of code that clearly reproduces the issue. Without this, we won't be able to help I'm afraid.
@datumbox The bug happened during the training process. So, I need to provide the full training and evaluation code for you to check and repeat. And, I find the bug only happens when using adversarial training, as the issue said.
@GuanlinLee I understand. The problem is if the code that reproduces the issue is 800 lines long, it's going to be very hard for us to review and debug. I appreciate that the problem is complex because it involves training models but to make your issue more actionable it will help if you further debug further and provide a minimal example.
Hi @datumbox, I have just modified my repo. Now, the code only has about 100+ lines. The model resnet is the official one but for cifar10, so the kernel size of the first covn layer is 3 instead of 7. Hope the current version can help you debug.
Thanks for the changes, I will try it out and see if I can reproduce the differences that you specify.
Hi @GuanlinLee, I think the differences of accuracy in the training is caused by randomness. When I run using --aug=0
and --aug=1
indeed they produce different result, but the differences is not big (roughly similar with running same aug twice).
Then I make sure that both method of transform produce same output by using the following script:
import torchvision
from torchvision import datasets
import torchvision.transforms as transforms
import torch
import random
def set_seed(seed=0):
torch.manual_seed(seed)
random.seed(seed)
trainset_1 = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_1)
trainset_2 = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_2)
def compare_data(n):
# Get data using method 1
set_seed()
transform_1=transforms.Compose([transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor() ])
x_1, y_1 = trainset_1[n]
# Get data using method 2
set_seed()
transform_2=transforms.Compose([transforms.ToTensor()])
def data_aug(image):
image = transforms.RandomCrop(32, padding=4)(image)
image = transforms.RandomHorizontalFlip()(image)
return image
x_2, y_2 = trainset_2[n]
x_2 = data_aug(x_2)
return torch.allclose(x_1, x_2)
is_all_true = True
for i in range(len(trainset_1)):
if compare_data(i) == False:
print(f"[n={i}] return False!")
is_all_true = False
break
if is_all_true:
print("All data is the same!")
And indeed this script will print All data is the same!
, which mean whether using method_1 or method_2, they produce very close result and that is what we expect.
@YosuaMichael Thanks for your verification. Have you tried to run more number of training epochs? And if possible, please let me see the training accuracy on both adversarial examples and clean data. The differences between aug_1 and aug_2 will be much bigger with the training epochs increasing. However, I do not know the reason. And I run my experiments under the same random seed.
Hi @GuanlinLee , I have experimented by running multiple times for each variant and try to make it as deterministic as possible (setting seed on multiple place), and indeed the the second method consistently having around 2-4% more accuracy after 100 epoch.
After more investigation and thinking, I think I know why it is different. On the second method you apply the augmentation (RandomCrop + HorizontalFlip) on a batch level, hence all images on the same batch will get the same exact randomness in augmentation. In case of RandomCrop, they are cropped on the same location.
On the other hand the first method apply the augmentation on the image level, hence every image in the batch got different augmentation randomness. In the case of RandomCrop, each image on the batch are cropped on different location.
I am not really sure why the second method yield higher accuracy on the training. My current hypothesis, it maybe because the data is kinda easier (less augmentation randomness) and hence easier to converge as well. However I feel that if you try to measure the test accuracy in the long run, it might not be the case.
Overall I think this is not a bug on torchvision, but more unexpected behaviour on the implementation. I will close the issue now since I think we know the cause of the problem and it is not a bug on torchvision. However feel free to reopen if you think there is different explanation.
🐛 Describe the bug
For Data Augmentation applied on PIL Images and then using ToTensor, the training code is like this:
For First using ToTensor to covert PIL Images to tensors firstly and the applying Data Augmentation on tensors, the training code is like this:
Clearly, these two versions of training codes are expected to have similar results on the train set. However, I find the second one can cause a significant decrease. Here are training results:
For First Version:
For Second Version:
I save some images before and after data augmentation:
The first two rows are images before data augmentations. 3-4 rows are images for Second Version Code.
The images are for First Version Code.
So, what is the reason making the two data augmentation ways give different results? There may some bugs. However, I cannot find them from the source codes.
Versions
cudatoolkit 10.1.243 h6bb024c_0
cudnn 7.6.5 cuda10.1_0
numpy 1.19.5 pypi_0 pypi opencv-python 4.5.5.64 pypi_0 pypi pillow 8.4.0 py37h5aabda8_0 python 3.7.1 h0371630_7 torch 1.10.2 pypi_0 pypi torchvision 0.11.3 pypi_0 pypi