zyxElsa / CAST_pytorch

Official implementation of the paper “Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning”(SIGGRAPH 2022)
Apache License 2.0
174 stars 25 forks source link

How do I increase the image resolution? #3

Closed mrakgr closed 2 years ago

mrakgr commented 2 years ago

I've gotten the code to run without a problem, but even though it looks fine the image produced is only 256 x 256. In order for this to really be useful, I'd want to make 1k images. Would that be possible? Also how much memory would that require?

mrakgr commented 2 years ago
        parser.add_argument('--load_size', type=int, default=512+128+32, help='scale images to this size')
        parser.add_argument('--crop_size', type=int, default=512+128, help='then crop to this size')

I've changed the defaults to this and it works. I really like how the image looks.

I can't go beyond this on my 4gb GTX 970 without running out of memory. But since I am just running it in forward mode, I think the CPU would be a viable choice here. My main system has 8gb...which I am not sure would be enough for 1k if I assume linear scaling of memory consumption with regards to the image size.

I was going to ask how do I get it to run in CPU mode, but now that I think about it more deeply why isn't 4gb enough if it is just the forward mode? I've read the paper, and unlike vanilla neural style transfer it is not supposed to be doing any optimization, but is instead just passing the image through the generator, right? Is it possible that it is keeping the backwards pass intermediaries in memory by accident during test time?

mrakgr commented 2 years ago
parser.add_argument('--gpu_ids', type=str, default='0', help='gpu ids: e.g. 0  0,1,2, 0,2. use -1 for CPU')

I've tried setting the above to -1.

Traceback (most recent call last):
  File "test.py", line 57, in <module>
    model.parallelize()
  File "E:\CAST_pytorch\models\base_model.py", line 102, in parallelize
    setattr(self, 'net' + name, torch.nn.DataParallel(net, self.opt.gpu_ids))
  File "C:\Users\Marko\anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 134, in __init__
    output_device = device_ids[0]
IndexError: list index out of rang

I get this error. I've taken a look whether the forward pass is using no_grad and it seems it is, but maybe the problem is during the data dependent initialization.

mrakgr commented 2 years ago
    def data_dependent_initialize(self, data):
        """
        The feature network netF is defined in terms of the shape of the intermediate, extracted
        features of the encoder portion of netG. Because of this, the weights of netF are
        initialized at the first feedforward pass with some input images.
        Please also see PatchSampleF.create_mlp(), which is called at the first forward() call.
        """
        self.set_input(data)
        bs_per_gpu = self.real_A.size(0) // max(len(self.opt.gpu_ids), 1)
        self.real_A = self.real_A[:bs_per_gpu]
        self.real_B = self.real_B[:bs_per_gpu]
        self.forward()  # compute fake images: G(A)

Yeah, the problem is in this. It is running with the backwards pass enabled here. I don't understand what it is trying to do here. Would it be fine to add a no_grad block here? I'll give it a try.

Also...

        if i == 0:
            model.data_dependent_initialize(data)
            model.setup(opt)               # regular setup: load and print networks; create schedulers
            model.parallelize()
            if opt.eval:
                model.eval()

Is it really fine that this is done only once, on the first data point? If I passed in multiple content and multiple style images would this still work as intended?

mrakgr commented 2 years ago
    def data_dependent_initialize(self, data):
        """
        The feature network netF is defined in terms of the shape of the intermediate, extracted
        features of the encoder portion of netG. Because of this, the weights of netF are
        initialized at the first feedforward pass with some input images.
        Please also see PatchSampleF.create_mlp(), which is called at the first forward() call.
        """
        self.set_input(data)
        bs_per_gpu = self.real_A.size(0) // max(len(self.opt.gpu_ids), 1)
        self.real_A = self.real_A[:bs_per_gpu]
        self.real_B = self.real_B[:bs_per_gpu]
        with torch.no_grad():
            self.forward()  # compute fake images: G(A)

It worked!

        parser.add_argument('--load_size', type=int, default=1024+256+128+32, help='scale images to this size')
        parser.add_argument('--crop_size', type=int, default=1024+256+128, help='then crop to this size')

With the latest change I can go up to 1408. Here is a sample. It is butterfly + starry night, I got the images from the NNST repo. I'll play around with it for a bit. It might be worth resolving why the CPU mode is not working. If that could be done, anybody could run this model on high res images. Even though I have a 4gb GPU, due to memory fragmentation issues, the system cannot allocate more than 3gb.

C1_fake_B

mrakgr commented 2 years ago

Also, another problem is that I want to pass 1920 x 1080 and not have them be squished into a square. Is there a way to maintain the aspect ratio of the original image?

mrakgr commented 2 years ago

I managed to optimize it a bit more.

    def encode(self,input):
        r = input
        for i in range(4):
            func = getattr(self, 'enc_{:d}'.format(i + 1))
            r = func(r)
        return r

    def forward(self, content, style, encoded_only = False):
        style_feats = self.encode(style)
        content_feats = self.encode(content)
        if encoded_only:
            return content_feats, style_feats
        else:
            adain_feat = self.adain(content_feats, style_feats)
            return adain_feat
    def forward(self):
        """Run forward pass; called by both functions <optimize_parameters> and <test>."""

        self.real_A_feat = self.netAE(self.real_A, self.real_B)  # G_A(A)
        # self.real_B_feat = self.netAE(self.real_B, self.real_A)  # G_A(A)
        # self.fake_A = self.netDec_A(self.real_B_feat)
        self.fake_B = self.netDec_B(self.real_A_feat)

If the intermediates are not held in memory as well as some commenting out is done in forward, the limit on my GPU goes to 1728 x 1728. A 23% increase on both axes. I'll investigate how make it run on the CPU tomorrow. As an aside, it is possible to comment out data_dependent_initialize completely.

Now that I've studied this in depth, one thing that I've found really surprising is that this architecture has two decoders. I can't see any mention of that in the paper and am wondering why this is so?

mrakgr commented 2 years ago
    for i, data in enumerate(dataset):
        if i == 0:
        #     model.data_dependent_initialize(data)
            model.setup(opt)               # regular setup: load and print networks; create schedulers
        #     model.parallelize()
        #     if opt.eval:
        #         model.eval()

Getting it to run on the CPU isn't hard, one just has to comment out parallelize. Now I can get 2048 x 2048 out without issue.

mrakgr commented 2 years ago
parser.add_argument('--preprocess', type=str, default='none', help='scaling and cropping of images at load time [resize_and_crop | crop | scale_width | scale_width_and_crop | none]')

To get it to stop messing with the aspect ratio set the preprocess option to none. Previously it was set to resize_and_crop. With this I know all that I need to know about controlling the resolution, so let me close this issue here.