msxie92 / ScreenStyle

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)
Other
36 stars 6 forks source link

Some problems in code #3

Closed TsXor closed 1 year ago

TsXor commented 2 years ago

Your code requires the image to be grey and have a size that is divisible by 16, but there is no sanitization after reading image. Therefore, the code is very likely to throw error. We can tackle this by reading the image with PIL and pad it:

import numpy as np
from PIL import Image

def PILopen(path, astype='L'):
    return np.asarray(Image.open(str(path)).convert(astype))
def modpad(img, mods):
    imgy, imgx = img.shape
    mody, modx = mods
    pady = mody-imgy%mody; padx = modx-imgx%modx
    padded = np.pad(img, ((0, pady), (0, padx)), 'linear_ramp')
    return padded

image = PILopen('path/to/image')
impad = modpad(image, (16,16))
# codes here
...
sizey, sizex = image.shape
result = impad[0:sizey, 0:sizex]

Further reading of code shows that padding have been done internally:

    def npad(self, im, pad=128):
        h,w = im.shape[-2:]
        hp = h //pad*pad+pad
        wp = w //pad*pad+pad
        return F.pad(im, (0, wp-w, 0, hp-h), mode='constant',value=1)

but it only work in dec() and there's no padding in enc()

TsXor commented 2 years ago

Prerequisites of ScreenVAE says: CPU or NVIDIA GPU + CUDA CuDNN But in ScreenVAE/models/networks.py, it is asserted that CUDA is configured. It seems that the assertion only works when gpuids are passed, but it doesn't work on cpu anyway.

TsXor commented 2 years ago

I bet you don't fu*kingly know why python have a builtin called "import" You write ScreenVAE, and partially copy it into visual.py and edit.py, and copy it into BidirectionalTranslation, and then copy it into your new manga inpainting project. Then what do you want to do when you want to include manga inpainting into something else? Copy it again?!

TsXor commented 2 years ago

https://github.com/msxie92/ScreenStyle/blob/36211a97831fe03ede758506a1230907f4ff685c/BidirectionalTranslation/models/cycle_ganstft_model.py#L59 and here is the forward method of SVAE: (Yes your copied version BRUH) (disclaimer: "copy" means that he chooses to copy instead of making it a module)

    def forward(self, x, line=None, screen=True, rep=False):
        if screen:
            if line is None:
                line = torch.zeros_like(x)
            else:
                line = torch.sign(line)
                x = torch.clamp(x + (1-line),-1,1)
            gaborfeat = torch.cat([x, line], 1)
            inter = self.enc(gaborfeat)
            scr, logvar = torch.split(inter, (self.inc, self.inc), dim=1)
            if rep:
                return scr
            recons = self.dec(scr)
            return recons, scr, logvar
        else:
            recons = self.dec(x)
            return recons

and this is the original ScreenVAE:

    def forward(self, x, line=None, screen=False, rep=False):
        if line is None:
            line = torch.ones_like(x)
        else:
            line = torch.sign(line)
            x = torch.clamp(x + (1-line),-1,1)
        if not screen:
            input = torch.cat([x, line], 1)
            # input = F.pad(input, (32, 32, 32, 32), mode='constant',value=1)
            inter = self.enc(input)
            scr, logvar = torch.split(inter, (self.outc, self.outc), dim=1)#[:,:,32:-32,32:-32]
            # scr = scr*torch.sign(line+1)
            if rep:
                return scr
            recons = self.dec(scr)
            return recons, scr#, logvar
        else:
            h,w = x.shape[-2:]
            x = self.npad(x)
            recons = self.dec(x)[:,:,:h,:w]
            recons = (recons+1)*(line+1)/2-1
            return torch.clamp(recons,-1,1)

I'm vomiting.

Also I wonder how you know if you are using Encoding mode or Decoding mode while writing these.

TsXor commented 2 years ago

According to the paper, the Discriminator of the Adversarial Loss is also "learnable", so is there any checkpoint of Discriminator available?

TsXor commented 1 year ago

https://github.com/JulianJuaner/MangaScreenTone/blob/c1621ab956d1b309cfc959a3bf88b0a48e0dfbf4/vae/networks1.py Is this that "gabor wavelet"?

TsXor commented 1 year ago

https://github.com/msxie92/ScreenStyle/blob/36211a97831fe03ede758506a1230907f4ff685c/BidirectionalTranslation/models/networks.py#L1294-L1303 Then why do you name it define_SVAE? You want it to define something else???

TsXor commented 1 year ago

排查了一个小时没法复现BidirectionalTranslation的原因,最后用pdb取中间结果才发现是ScreenVAE生成有误 这说明作者复制完他的ScreenVAE还改了。真他**的**** 又排查了,是我pad的方法有误,应该用replicate,而不是constant 1,坏了,dinner竟是我自己。 但这还是改变不了作者复制完ScreenVAE还改的让人极度血压升高的事实。你知道吗?这两个版本里inc和outc是反过来的!(复制版的outc是原版的inc,反之同理) 是的,他复制了,跟炫耀似的,他还改了。这辈子血压没这么高过。 Python的哲学是不要重复造轮子,作者倒好,造完轮子复制粘贴到别的地方,还改了,还没法看到底都改了啥(改的时候没有git记录,而且全TM给你粘贴到一个文件里,就是有git记录也跟踪不了)

你说能不能跑?能跑。 咋跑?用终端跑。 有没有可以在python这一层调用的方法?没有。 能不能用python调用?能。胶水python为你拿来了subprocess,快说谢谢python。 麻不麻烦?当然麻烦。你要中途存储文件,好让shell脚本能读取路径,还得控制文件删除的时机,防止过早或过晚。 值不值得?你有没有想过,如果这个脚本能通过python import,你完全可以把你的变量直接传过去,把一次读一次写这么一个很慢的IO换成传一个指针?

我现在完全有理由怀疑作者不开放训练源码的原因跟这个一样(虽然这个开放了,而且最后一看写的似乎也没那么烂),写太烂了,作者这个有可能甚至离了自己老板(指导师)那台工作站/服务器上的环境就跑不起来。

TsXor commented 1 year ago

输入:examples中的809_input.png 跑原仓库脚本: input_001_color - 副本 (这些“彩色锯齿”大概率是examples中的line图片以jpg格式存储时产生的jpeg artifact造成的) 跑我fork的脚本: 809_input

用pdb截获我fork的脚本中途生成的screenmap tensor,然后再在原仓库脚本中用rpdb断点,将screenmap替换为上述截获的screenmap: input_001_color

上一条反向注入: 809_input

纠正一下我上面的说法,这俩都有锅,虽然很明显还是ScreenVAE的锅更大。

msxie92 commented 1 year ago

首先,我为这份code给你带来的不便道歉,我也认同这里面仍然存在诸多问题。我并没有从头到尾的学过python,了解它的全部机制,所以会有使用不当的地方。也感谢你提出的这些意见,我会在往后注意。