Possibility to assemble large images?

FlorianWendelborn commented 9 years ago

As I do not know how the program works, I'm not sure if what I'm suggesting is possible.

As far as I can tell the biggest problem using "normal GPUs" for this program is that they only have around 2-4 GiB of VRAM. Is it possible to modify this program so that it splits the image into multiple parts and iterates over every of them - then merges the images, adjusting the borders a little bit - and splits them up again? This way it might be possible to render nearly any image size on the GPU, and therefore would drastically improve the speed of rendering large images (now using GPU instead of CPU) and allow people without 32-core servers with 240 GiB of RAM to render images suitable for wallpapers and prints.

If this merging and splitting is done after every n iterations it should be possible to create an image which can be combined, shouldn't it?

Hypfer commented 9 years ago

My first solution is to tile the image with imagemagick and run this on every tile. After that combine the tiles again. I think the style image should also be tiled but I'm not entirely sure about that.

Nonetheless, this should most likely be handled by the code of neural-style and not some bash voodoo around it.

jcjohnson commented 9 years ago

In cnn-vis I used a tiling strategy similar to that described by @dodekeract to generate high-resolution images, so such a thing is certainly possible. In my experience there, to get nice contiguous images out from a tiling strategy you need to use overlapping tiles, and alternate between optimization steps on each tile.

Unfortunately, such a tiling strategy tends to give ugly discontinuities at the borders between regions, and also significantly increases code complexity. However if this is something that a lot of people want I'd consider working on it.

FlorianWendelborn commented 9 years ago

I'd definitely :+1: this. These kind of images can get very beautiful and there is no way I can afford rendering a fitting wallpaper for my setup without tiling. (5760x1080 with 3GB VRAM and 16 GB RAM would never work). I already thought about trying to create a pull-request or fork which implements this, but I never worked with either Lua, CUDA or neural networks, so I'm really not sure if that's the right way to start. If it's not that much work you should definitely implement this as low resolution outputs or absurdly high memory usage is one of the biggest issues of this project / method.

Another strategy I could imagine for this is:

rendering the output image as a very small "pre-render"
upscale it to the desired resolution
tile it
run some restoration algorithm/neural network over these tiles
try to merge it
maybe run something to smooth the overlapping sections etc.

However, as my knowledge about neural networks is quite limited I do (again) not know if this is a way which could work - or even work better than the one you're using in cnn-vis.

jcjohnson commented 9 years ago

Ok, I'll think about it. No promises since this is a pretty nontrivial change.

In the meantime, you should be able to stretch your 3GB farther using the strategies here; certainly not to 5760x1080, but using cuDNN and ADAM, generating a 920x690 image takes just under 3GB of GPU memory.

dasantonym commented 9 years ago

what about cuda-aware mpi? could the data be held in a unified virtual address space to scale the possible max resolution? i'd like to look deeper in that stuff but i'm a newbie to cuda and torch so i am not sure if this might be the right idea to pursue.

here's mpi for torch: https://github.com/sixin-zh/mpiT

tindoductran commented 9 years ago

just wondering...(I am still going through the installation steps). What is the maximum possible length right now? or is it dependent upon other factors like RAM size and stuff?

jcjohnson commented 9 years ago

It's mostly bottlenecked by your available memory.

With a 12GB Titan X, using the cudnn backend and ADAM optimizer, the max image size I can run is about 1700x1000 - however I have yet to get good results at this resolution, because it takes a long time to try out different hyperparameters.

tindoductran commented 9 years ago

is that output size? like is it much slower if i use 2000x2000 input files?

jcjohnson commented 9 years ago

Yes, output size is the big constraint. Input size doesn't really matter, as the input images will be resized before the output is generated.

rainerkohlberger commented 9 years ago

Hi, how much GPU memory would be needed for a 1280x720px output? Would 6GB be enough?

jcjohnson commented 9 years ago

6GB may just be enough, but you'll have to use ADAM and cuDNN.

rainerkohlberger commented 9 years ago

thank you! and how much RAM do you think 1920x1080px would need? 12GB might not be enough if you are maxing out at 1700x1000px?

jcjohnson commented 9 years ago

I made this at 1920x1010, and it used around 11GB:

fdqb35k

I'm currently running a 4k display off the Titan X, and X server + compiz is taking around 1GB of GPU memory; running the display off a different GPU and devoting the entire 12GB to neural-style should be enough for 1920x1080.

hughperkins commented 9 years ago

Hmmm, I just tried monkeyhacking sequential a bit. Seems like we only do forward prop, no backprop, is that right?

Insert at line 12 of neural_style.lua:

function nn.Sequential:updateOutput(input)
   print('update output')
   local currentOutput = input
   for i=1,#self.modules do
      currentOutput = self.modules[i]:updateOutput(currentOutput)
   end
   self.output = currentOutput
   return currentOutput
end

function nn.Sequential:updateGradInput(input, gradOutput)
   print('my updategradinput')
   local currentGradOutput = gradOutput
   local currentModule = self.modules[#self.modules]
   for i=#self.modules-1,1,-1 do
      local previousModule = self.modules[i]
      currentGradOutput = currentModule:updateGradInput(previousModule.output, currentGradOutput)
      currentModule = previousModule
   end
   currentGradOutput = currentModule:updateGradInput(input, currentGradOutput)
   self.gradInput = currentGradOutput
   return currentGradOutput
end

Edit: ah, I see, we are calling Sequential.backward, and Sequential.backward just direclty calls module.backward, doesnt first call Sequentail.updateGradInput

hughperkins commented 9 years ago

I managed to lower usage of memory on cuda from around ~630MB to around 570MB by shuffling the weights backwards and forwards to main memory. The 'real' time, output by time doubled from 18seconds to 34seconds. This is for 50 iterations at size 200. It's not very impressive, but maybe it throws out some ideas, that someone can evolve, to reduce memory usage a bit more, and maybe increase the runtime a little less?

What I did was: Add the following at line 12, of neural_style.lua:

function nn.Sequential:updateOutput(input)
   print('update output')
   local currentOutput = input
   for i=1,#self.modules do
      local module = self.modules[i]
      if torch.type(module) == 'nn.SpatialConvolutionMM' then
         module.weight = module.weight:cuda()
      end
      currentOutput = module:updateOutput(currentOutput)
      if torch.type(module) == 'nn.SpatialConvolutionMM' then
         module.weight = module.weight:float()
      end
      collectgarbage()
   end
   self.output = currentOutput
   return currentOutput
end

function nn.Sequential:backward(input, gradOutput, scale)
   print('backward')
   scale = scale or 1
   local currentGradOutput = gradOutput
   local currentModule = self.modules[#self.modules]
   for i=#self.modules-1,1,-1 do
      local previousModule = self.modules[i]
      if torch.type(currentModule) == 'nn.SpatialConvolutionMM' then
         currentModule.weight = currentModule.weight:cuda()
         currentModule.gradWeight = currentModule.gradWeight:cuda()
         collectgarbage()
      end
      currentGradOutput = currentModule:backward(previousModule.output, currentGradOutput, scale)
      if torch.type(currentModule) == 'nn.SpatialConvolutionMM' then
         currentModule.weight = currentModule.weight:float()
         currentModule.gradWeight = currentModule.gradWeight:float()
         collectgarbage()
      end
      currentModule = previousModule
   end
   currentGradOutput = currentModule:backward(input, currentGradOutput, scale)
   self.gradInput = currentGradOutput
   return currentGradOutput
end

Add the following just after the line cnn = nil, just after the comment We don't need the base CNN anymore, so clean it up to save memory:

  for i=1,#net.modules do
    local module = net.modules[i]
    print('module', module)
    if torch.type(module) == 'nn.SpatialConvolutionMM' then
       module.gradWeight = module.gradWeight:float()
       module.weight = module.weight:float()
    end
  end

jcjohnson commented 9 years ago

@hughperkins Neat trick! Since we just use convolution, pooling, and relu layers, the weights will be pretty small; I'd guess that you'd see a bigger memory savings if you also shuffled activations out to main memory.

hughperkins commented 9 years ago

Yes, maybe :-) I'm not planning on going further with this particularly. Just throwing the idea out there, in case someone is sufficiently motivated to take it further. Per my understanding, you yourself have a Titan X, which is sufficinetly large, and your main issue is training time, rather than memory?

zenpoet commented 9 years ago

hi, awesome program (cf. my avatar). I was playing w/ texture transfers several years ago but this is the first time my wife said "wow"!

Looking at your "1920x1010/11GB stanford + starry night", i was wondering whether there was some limits to scalability in terms of image quality? for example, is it a matter of adjusting various parameters to get the starry night pattern to not repeat as often?

jcjohnson commented 9 years ago

@zenpoet Yep, you're right - as you move to bigger images it becomes tough to get features at the correct scale. The underlying neural network is fully convolutional so it can be used with any size image, but each neuron has a fixed effective receptive field size in the input image no matter its size. This means that the absolute scale of texture in the generated image will match the absolute scale of the style image, which can lead to repetitive patterns when generating large images.

One workaround is to rescale the style image relative to the generated image before extracting style features; you can do this with the -style_scale flag. A value of 1.0 means that the style image will be resized to the size of the output image before extracting style features; smaller values will cause the style image to be smaller than the generated image, and larger values will cause the style image to be bigger.

However this still causes the absolute scale of features to be the same. An even better workaround that I've played with a little bit for generating bigger images that look nice is to use a multistage pipeline. For example, say you want to generate a 1920x1080 image and have an 800x600 style image; you can first rescale the style image to 480x270 and generate a 480x270 image, then use this generated image as the content seed for another run of neural-style that produces your final 1920x1080 output; however the second run should have a much higher content weight than the first run, since in the second run we really just want to upsample and fill in fine details.

I think that this strategy (possibly with more than two stages) could work well for eliminating repetitiveness in high-res outputs, but it would probably involve tuning quite a few parameters to get good results.

hughperkins commented 9 years ago

@zenpoet Nice avatar :-)

@jcjohnson :

An even better workaround that I've played with a little bit for generating bigger images that look nice is to use a multistage pipeline

Interesting. This is similar to how soumith et al's Eyescream works http://soumith.ch/eyescream/

zenpoet commented 9 years ago

@jcjohnson. Thanks for the tips. Any chance you could post a successful high res image? Curious to see how far you've been able to stretch the limits of the program.

liftup commented 9 years ago

@zenpoet

Hello!, I got good results with upscaling the image with waifu2x here http://waifu2x.udp.jp/ . Waifu2xIt is developed to scale anime/art images using deep convolutional neural networks. I am able to scale the default output length of 512 to 2560 without loosing quality.

@jcjohnson Awesome repo, was able to create a unique birthday present for a friend. Thnx for making it open source!

zenpoet commented 9 years ago

Cool! I’ll have to give it whirl.

On Oct 28, 2015, at 10:35 AM, hello notifications@github.com wrote:

@zenpoet https://github.com/zenpoet Hello!, I got good results with upscaling the image with waifu2x here http://waifu2x.udp.jp/ http://waifu2x.udp.jp/ . Waifu2xIt is developed to scale anime/art images using deep convolutional neural networks. I am able to scale the default output length of 512 to 2560 without loosing quality.

https://cloud.githubusercontent.com/assets/12619018/10791606/62f6bb8e-7d89-11e5-8a1c-ffd350bd57ee.png @jcjohnson https://github.com/jcjohnson Awesome repo, was able to create a unique birthday present for a friend. Thnx for making it open source!

— Reply to this email directly or view it on GitHub https://github.com/jcjohnson/neural-style/issues/36#issuecomment-151864905.

ProGamerGov commented 8 years ago

@jcjohnson A user has figured out how to tile an image for use in Neural-Style and then recombine the pieces without ugly borders between the images: https://www.reddit.com/r/imagemagick/comments/4r8h0x/how_to_crop_an_image_into_overlapping_tiles/

We currently need help figuring out how to setup a script with ImageMagick to implement the procedure. Maybe something like this can even make it into Neural-Style itself?

sheerun commented 8 years ago

I've got pretty good results upscaling my avatar the method @jcjohnson describe:

orignal image (I had only low-res result from early experiments) and style image
upscaled image: original image + style Image, maximum resolution on Titan X

Then I cut upscaled image into n pieces (in this case 3x3) with 25% overlap, use each of pieces as content image, and leave style image the same. Then stitch them together and use slightly higher style scale (e.g. 1.5 - 2.5, but the 1.0 is fine if you want fine detail). The final result is following:

stitched image

I've also tried 5x5 grid, but the result is not as nice as 3x3.

It's hard to avoid different styling of individual tiles, but using a "guide" styled image makes the differences far less obvious. For example notice blue shadow on my hair and moustache that is not present on original styled image. Maybe halving iterations count on enhanced image could help..

You can also upscale with waifu2x, but apparent upscaling is not 2x, maybe 1.2x at most.

albarji commented 8 years ago

Amazing results @sheerun ! May I ask, what methods or tools did you use to piece the image into overlapping tiles, and then to stitch them back? Specially the stitching part seems tricky, as you would have to merge the overlapping pixels somehow...

jareers commented 6 years ago

@hughperkins @jcjohnson I know it has been years since anyone tried anything about this over here, but I followed upon your idea. However, when I try to move activations back and forth, it doesn't release gpu memory despite calling collectgarbage(). Any idea why that is happening? Has anything changed in torch/lua on a fundamental level?

Anyone interested can see slightly more details here on the question I asked at stackexchange: Torch: Unable to free gpu memory after moving tensors to cpu

jcjohnson / neural-style

Possibility to assemble large images? #36