takerum / vat_tf

Virtual adversarial training with Tensorflow
MIT License
251 stars 80 forks source link

PyTorch implementation #5

Open reachablesa opened 6 years ago

reachablesa commented 6 years ago

Hi,

I am trying to implement your code in PyTorch.

I believe I implemented VAT loss accurately. But, I cannot get the same performance probably because I used a different ConvNet. When I try to replicate your convnet; namely: "conv-large" the network did not work at all. Here, I am copying my code for conv-large in PyTorch. I would appreciate if you can give me a feedback on what might be wrong.

Also, in the paper you are referring to the paper "Temporal Ensembling for Semi-Supervised Learning" for the network used in experiments. But, they are adding Gaussian noise in the first layer while I could not find noise in your implementation.

import torch.nn as nn import torch.nn.functional as F

class conv_large(nn.Module): def init(self): super(conv_large, self).init()

    self.lr = nn.LeakyReLU(0.1)
    self.mp2_2 = nn.MaxPool2d(2, stride=2, padding=0)
    self.drop = nn.Dropout(p = 0.5)

    self.bn128 = nn.BatchNorm2d(128, affine=True)
    self.bn256 = nn.BatchNorm2d(256, affine=True)
    self.bn512 = nn.BatchNorm2d(512, affine=True)

    self.conv3_128_3_1 = nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1);         

    self.conv128_128_3_1 = nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1);
    self.conv128_256_3_1 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1);
    self.conv256_256_3_1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1);

    self.conv256_512_3_1 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=0);
    self.conv512_256_1_1 = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0);
    self.conv256_128_1_1 = nn.Conv2d(256, 128, kernel_size=1, stride=1, padding=0);

    self.avg = nn.AvgPool2d(6, ceil_mode=True) # global average pooling
    self.fc = nn.Linear(128, 10)

def forward(self, x):
    x = self.conv3_128_3_1(x);

    x = self.bn128(x); x = self.lr(x)  

    x = self.conv128_128_3_1(x); 

    x = self.bn128(x); x = self.lr(x)     
    x = self.conv128_128_3_1(x); x = self.bn128(x); x = self.lr(x)   

    x = self.mp2_2(x); 
    x = self.drop(x)

    x = self.conv128_256_3_1(x); 
    x = self.bn256(x); 
    x = self.lr(x)

    x = self.conv256_256_3_1(x); 
    x = self.bn256(x); x = self.lr(x)

    x = self.conv256_256_3_1(x);
    x = self.bn256(x); x = self.lr(x)

    x = self.mp2_2(x); 
    x = self.drop(x)

    x = self.conv256_512_3_1(x); 
    x = self.bn512(x); x = self.lr(x)

    x = self.conv512_256_1_1(x); 
    x = self.bn256(x); x = self.lr(x)

    x = self.conv256_128_1_1(x); 
    x = self.bn128(x); x = self.lr(x)        

    x = self.avg(x)
    x = x.view(x.size(0),-1)
    x = self.fc(x)

    return x
takerum commented 6 years ago

Hi,

You use the same instance of nn.BatchNorm2d within different layers. I am not familiar with PyTorch implementation of BN well, but I think you should use a different instance of BN for each layer.

reachablesa commented 6 years ago

That seems to be the mistake. Thanks for your reply.

YanLiang0813 commented 6 years ago

@reachablesa Hi reachablesa, I want to implement this code in PyTorch, but when i compute the r_vadv, it is always 0,could you show me your Pytorch code? thanks !