zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 450 forks source link

我怎么才能直接使用Encoding Layer呢?我这里使用pip命令不能成功安装 #384

Closed RSMung closed 3 years ago

RSMung commented 3 years ago

您好,我的环境是python3.7 pytorch1.7 torchvision 0.8.1 使用命令pip install git+https://github.com/zhanghang1989/PyTorch-Encoding/后大量报错,我无法解决, 因此求助,我尝试将Encoding Layer相关的代码进行呢copy,但是encoding/lib/中的代码无法进行引用,我应该怎么做呢?

另外我使用pytorch矩阵运算实现了Encoding模块,但是我发现在计算eik的过程中会出现数据爆炸的情况(Nan),不知道您是否有解决办法呢?我的代码详见后续

望指点迷津,非常感谢

class CodeBookBlock(nn.Module):
    def __init__(self, in_channels, c2, out_channels):
        super(CodeBookBlock, self).__init__()
        self.c2 = c2
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels, c2, kernel_size=1),
            nn.BatchNorm2d(c2),
            nn.LeakyReLU()
        )
        self.codebook = nn.Parameter(torch. Tensor(c2, Config.K), requires_grad=True)
        self.scale = nn.Parameter(torch.Tensor(Config.K), requires_grad=True)
        self.dp = nn.Dropout(0.5)  # 不能用batchnorm2d,否则会造成fc之后数值全部变成nan
        self.relu = nn.ReLU6()
        self.leakyRelu = nn.LeakyReLU()
        self.fc = nn.Linear(self.c2, out_channels)
        self.sigmoid = nn.Sigmoid()
        self.init_params()  # 初始化参数
        torch.autograd.set_detect_anomaly(True)

    def init_params(self):
        std1 = 1. / ((Config.K * self.c2) ** (1 / 2))
        self.codebook.data.uniform_(-std1, std1)
        self.scale.data.uniform_(-1, 0)

    def forward(self, z):
        """
        :param z: (Batch, c, h, w)
        :return: (Batch, c2)
        """
        batch, c, h, w = z.shape
        N = h * w
        z1 = self.conv1(z)
        z1 = z1.flatten(start_dim=2, end_dim=-1)  # Batch, c2, N
        # print("我是z1")
        # print(z1.shape)
        # print(z1)
        # --------------开始计算放缩因子gama--------------
        # ---处理特征向量z1
        z1 = z1.unsqueeze(2)  # Batch, c2, 1, N
        z1 = z1.repeat(1, 1, Config.K, 1)  # Batch, c2, K, h*w
        # 将z1的K, N(即h*w)交换
        z1 = z1.transpose(2, 3)  # Batch, c2, N, K
        # print("z1")
        # print(z1)
        # ---处理codebook
        d = self.codebook.unsqueeze(1)  # c2, 1, K
        d = d.repeat(1, N, 1)  # c2, N, K
        d = d.unsqueeze(0)  # 1, c2, N, K
        d = d.repeat(batch, 1, 1, 1)  # batch, c2, N, K
        # print("d")
        # print(d.shape)
        # ---计算rik
        rik = z1 - d  # batch, c2, N, K
        # ---计算numerator
        rik = torch.pow(torch.abs(rik), 2)  # 对rik取绝对值并且平方   batch, c2, N, K
        # print(rik.shape)
        # 把scale从1, K变成   batch, c2, N, K
        scale = self.scale.repeat(N, 1)  # N, K
        scale = scale.unsqueeze(0).unsqueeze(0)  # 1, 1, N, K
        scale = scale.repeat(batch, self.c2, 1, 1)  # batch, c2, N, K
        # print(scale.shape)
        # 获得numerator
        # print("我是-scale * rik")
        # print(torch.max(-scale * rik))
        # 这里如果使用exp函数,会造成numerator的数值很大,进而造成后面的变量出现nan, 不用的话Rei可能为0造成后面除法出问题,因此这里改成+某个常数或者leakyRelu
        numerator = self.leakyRelu(-scale * rik)  # batch, c2, N, K
        # print("我是numerator")
        # print(torch.max(numerator))
        Rei = numerator.sum(3)  # eik公式中的分母   batch, c2, N
        # print(Rei.shape)
        # ---开始计算eik,必须在Rei计算完之后
        numerator = numerator * rik  # batch, c2, N, K
        # 将Rei从batch, c2, N变到batch, c2, K, N
        Rei = Rei.unsqueeze(2)  # batch, c2, 1, N
        # print(Rei.shape)
        Rei = Rei.repeat(1, 1, Config.K, 1)  # batch, c2, K, N
        # print(Rei.shape)
        # 将Rei的K, N交换
        Rei = Rei.transpose(2, 3)  # batch, c2, N, K
        # print("我是Rei")
        # print(Rei.shape)
        # # print(Rei)
        # print("Rei的最小值")
        # print(torch.min(Rei))
        # 获得eik
        # print("我是Rei")
        # print(torch.min(Rei))
        eik = numerator / Rei  # batch, c2, N, K
        # print("eik")
        # print(torch.max(eik))
        # 获得ek
        ek = eik.sum(2)  # batch, c2, K
        # print("ek")
        # print(ek.shape)
        # print(ek)
        # print(ek.shape)
        # 获得e
        e = ek.sum(2)  # batch, c2
        e = self.dp(e)
        e = self.relu(e)
        e = self.fc(e)
        # print("我是e")
        # print(torch.max(e))
        gama = self.sigmoid(e)
        # print("我是gama")
        # print(torch.max(gama))
        return gama  # batch, c2
zhanghang1989 commented 3 years ago

如果只需要 Encoding Layer,没必要安装这个 toolkit,这里有 python 版的: https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/ops/encoding.py#L6

RSMung commented 3 years ago

谢谢您的回复,真是太感谢您了!

RSMung commented 3 years ago

作者您好: 我使用了您给的链接中的源码。我想要找您验证一下我的理解是否有误: 这个Encoding Layer的输出是一个batch, channel的矩阵,然后我想要请教一下,我下面的操作是正确的吗? 此外我观察到您的代码中还有一层norm_layer使用的是SyncBatchNorm,我这里训练是用的单个GPU,我可以替换为BatchNorm吗?

init中的fc的定义
self.fc = nn.Sequential(
            nn.Linear(channels, channels),
            nn.Sigmoid()
        )

forward中对Encoding Layer的输出(encoded_feat)的操作

        # relu后求平均   bt, K, c ---> bt, c
        encoded_feat = F.relu(encoded_feat)
        encoded_feat = encoded_feat.mean(1)  # bt, c
        # fc
        gamma = self.fc(encoded_feat)
        gamma = gamma.view(batch_size, self.channels, 1, 1)  # bt, c, 1, 1
        # channel-wise multiplication
        output = F.relu(x + x * gamma)  # bt, c, h, w
zhanghang1989 commented 3 years ago

单卡可以使用普通BatchNorm

RSMung commented 3 years ago

好的谢谢老师

QY1994-0919 commented 2 years ago

@RSMung 您好,想问一下,您直接用了前面作者给出的源码是吗?

RSMung commented 2 years ago

@QY1994-0919 是的, 但是我总感觉上面给的代码和论文里的描述差距好大, 不知道你怎么看?因为这个仓库里面的源码, encoding的关键运算使用cuda完成的, 我也没法对照上面的pytorch版本看看是否有差距。

QY1994-0919 commented 2 years ago

@RSMung 是的,特别是在进行eik计算,还有一点我不是太清楚代码里codeword相当于论文里提出的visual center的概念,但是对于codeword的操作主要还是来源于scaled_l2,就是理解的不太透彻,所以你有什么见解吗?可以给我讲讲吗?谢谢

zhanghang1989 commented 2 years ago

看了一下 mmsegmentation 的实现,没什么问题,scaled_l2 输出的是论文里的公式3, s_k|r_ik|^2 ,也就是 a_ik softmax 之前的结果: https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/ops/encoding.py#L34 之后再 aggregate residual 就行了。

论文发表时候的实现是 Torch 写的,可以参考: https://github.com/zhanghang1989/Torch-Encoding-Layer/blob/master/layers/aggregate.lua https://github.com/zhanghang1989/Torch-Encoding-Layer/blob/master/layers/encoding.lua

后来这个 repo 用了 pytorch,因为那时候 pytorch 支持的运算符还不多,就自己写了 cuda 的实现。

QY1994-0919 commented 2 years ago

@zhanghang1989 谢谢您的回复 帮助特别大,我还是有一点不太明白,代码里面的D=in_channels, K=ncodes中的ncodes=32的值是怎么来的。谢谢 class EncModule(nn.Module): def init(self, in_channels, nclass, ncodes=32, se_loss=True, norm_layer=None): super(EncModule, self).init() self.se_loss = se_loss self.encoding = nn.Sequential( nn.Conv2d(in_channels, in_channels, 1, bias=False), norm_layer(in_channels), nn.ReLU(inplace=True), Encoding(D=in_channels, K=ncodes), norm_layer(ncodes), nn.ReLU(inplace=True), Mean(dim=1)) self.fc = nn.Sequential( nn.Linear(in_channels, in_channels), nn.Sigmoid()) if self.se_loss: self.selayer = nn.Linear(in_channels, nclass)

zhanghang1989 commented 2 years ago

ncodes=32 是 visual centers 的数量,当时是 empirically 选的

QY1994-0919 commented 2 years ago

@zhanghang1989 好的,十分谢谢您

QY1994-0919 commented 2 years ago

@zhanghang1989 class EncModule(nn.Module): def init(self, c1, nc, num_codes, se_loss=True): super(EncModule, self).init() self.num_codes = num_codes self.se_loss = se_loss self.encoding = nn.Sequential( nn.Conv2d(c1, c1, 1, bias=False), nn.BatchNorm2d(c1), nn.ReLU(inplace=True), Encoding(c1=c1, num_codes=num_codes), nn.BatchNorm1d(num_codes), nn.ReLU(inplace=True), Mean(dim=1)) self.fc = nn.Sequential(nn.Linear(c1, c1), nn.Sigmoid()) if self.se_loss: self.selayer = nn.Linear(c1, nc)

def forward(self, x):
    en = self.encoding(x)
    print(en.size(), "en_farword")
    b, c1, _, _ = x.size()
    gam = self.fc(en)
    print(gam.size(), "gamm")
    y = gam.view(b, c1, 1, 1)
    print(y.size(),  "yyy")
    x = F.relu_(x + x * y)
    print(x.size(), "Fxxx")
    if self.se_loss:
        x = self.selayer(x)
    return x

您好,我这里用coco数据集 nc=80,c1=256. x.size[1,256,8,8] 经过self.se_loss:输出结果肯定会报错RuntimeError: mat1 and mat2 shapes cannot be multiplied (2048x8 and 256x80) 请问这种情况要怎么解决 这里self.se_loss:是必须的吗,主要起到什么作用

zhanghang1989 commented 2 years ago

可以把 se_loss 关掉,这个就是 paper 里面的 semantic embedding loss,类别多的时候可能会稍有帮助。 这个错看起来是 channel 没有设置对

QY1994-0919 commented 2 years ago

@zhanghang1989 是的,已经改过来了,谢谢您

zhanghang1989 commented 2 years ago

@QY1994-0919 不客气,顺便提一下,这个工作属于比较早期的了,如果做 research 建议紧跟一些比较新的工作,更容易出成果。比如 segmentation 可以看看 maskformer (马上要推出 V2了)