Any ways to increase resolution?

ghost commented 7 years ago

I am getting some fascinating results with this (quite amazing at times), but the resolution is somewhat limited. When I attempt higher resolutions (near the 700x700 limit) or ratios (over 1.0) I get what I assume are GPU out of memory errors like: "cudnn_conv_layer.cu:28] Check failed: status == CUDNN_STATUS_SUCCESS (8 vs. 0) CUDNN_STATUS_EXECUTION_FAILED)". These occur at the later layers of PatchMatch (usually layer:conv2_1).

Are there any possible avenues to increase the resolution of content and style and output? Where are the main memory bottlenecks? Is it within PatchMatch or elsewhere? It's a little confusing because my GPU memory consumption seems quite low (2-3GB) even right before the error occurs.

I am running on a 1080ti (11GB), and I would love to be able to generate higher resolution output if at all possible. Any further guidance or info would be much appreciated.

rozentill commented 7 years ago

In our project, we would resize an image if it's too large. So if you input a 700x700 image, the program will resize it. Did you get this error using our resize codes or without those codes? Because this error may not be definitely caused by out of memory.

rozentill commented 7 years ago

The reason could be cuDNN, you can check a similar issue here https://github.com/BVLC/caffe/issues/2197. Actually our project does not need cuDNN so you can get rid of it.

ghost commented 7 years ago

This error does occur sometimes at higher ratios with your resizing code. I had just assumed that my particular case (about 10GB of video memory free) was not perfectly protected by the resize, which was perhaps written for the 12GB commonly available. If I manually resize the images down myself to a lower resolution than your code does I can usually get the ratio that failed to work (e.g. 1.0)

I will see if I can resolve that particular error by leaving out CUDNN (or perhaps using a different version, I am on 5.1).

But the larger issue for me is breaking through the limitation that requires the resizing in the first place. To put my question more simply, why do you have to resize the images down, and is there a way to increase that maximum resolution? Though the final results are upsized in your code as a last step, they only have the detail and information of at most a 700x500 image. I would like to run 1000x1000 and larger source and style images through the system at full resolution for even better results.

ghost commented 7 years ago

Update! I built Caffe without CUDNN and was able to get higher resolutions. Also, I get a more sensible error that I am out of memory (and can see the memory approach the limit using nvidia-smi). I still suspect the CUDNN error above was about memory, as it occurred only with increased image size and/or scale. It seems that CUDNN uses additional memory in this case.

By raising the resize limits in your code I was able to work with content and style images close to 1000x1000 up to a ratio of .7. With CUDNN I was only able to get up to ratio .3 with the same images. I'll try testing with different versions of CUDNN (I used 5.1) to see if any have better memory performance. The performance with CUDNN was faster, but not substantially so (29 seconds vs 34 without in one case) -- not enough to give up the higher resolution possibilities.

I'd still love to be able to go higher than this improved resolution. I wonder if changing out VGG-19 for a less resource intensive trained net would help (as it does with Neural Style and other style transfer systems). I tried replacing it with VGG-16 to test but it did not work. I guess there are more code changes required to utilize a different trained net.

rozentill commented 7 years ago

Yeah, you are right. CUDNN exactly uses additional memory so it may cause such error when the size become larger. I think there are two ways to increase the resolution.

You can use a better GPU with more memory which can handle images of larger size.
You can try to reduce the number of threads in a block (20x20 for now) which could reduce the memory it needs for computing.

ghost commented 7 years ago

I tried reducing the threads in a block, but got no noticeable reduction in memory usage. Perhaps I changed it incorrectly. I edited only DeepAnalogy.cu

I changed the 6 lines like: dim3 threadsPerBlockAB(20, 20, 1); to dim3 threadsPerBlockAB(10, 10, 1);

This did not get me any noticeable memory reduction.

I also then changed the surrounding lines like: dim3 blocksPerGridAB(data_A_size[curr_layer].width / 20 + 1, data_A_size[curr_layer].height / 20 + 1, 1);

I replaced the 20 with a 10 in those lines, in case that was necessary.

Again, no noticeable memory reduction.

As far as the better GPU, I can get occasional access to a P5000 with 16GB of memory, which will certainly help.

Thanks for your help -- I am getting some stunning results.

tisawe commented 7 years ago

I don't believe that you are running out of memory. I have a GT 740 with 4GB of memory and compiled the code with <UseCuDNN>true</UseCuDNN>, and the generated binary runs without error. I've used input images up to 30 megapixels with a ratio of 1, which I am assuming is no problem for the program since the input images are resized after the parameters are set. It seems that the minimum area of the input image is 40000 pixels^2 with the image being at least 200x200 pixels in height x width, and the maximum area probably being 350464 pixels^2 with the maximum height or width being 700 pixels. Maybe the aspect ratio of the images need to be in a range where the height and width can be equal to or greater than 200 and less than or equal to 700 while the area is kept between 40000 pixels^2 and 350464 pixels^2.

The source code for resizing the input images is in lines 104 through 229 in DeepAnalogy.cu.

andyhx commented 7 years ago

same problem, `F0603 15:33:11.045892 38 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** @ 0x7f18bb45ddaa (unknown) @ 0x7f18bb45dce4 (unknown) @ 0x7f18bb45d6e6 (unknown) @ 0x7f18bb460687 (unknown) @ 0x7f18bb8cdb61 caffe::SyncedMemory::to_gpu() @ 0x7f18bb8ccef9 caffe::SyncedMemory::mutable_gpu_data() @ 0x7f18bb8ced43 caffe::Blob<>::mutable_gpu_diff() @ 0x7f18bb989943 caffe::PoolingLayer<>::Backward_gpu() @ 0x7f18bb94b5f7 caffe::Net<>::BackwardFromTo() @ 0x40af70 my_cost_function::f_gradf() @ 0x42d045 lbfgs::gpu_lbfgs() @ 0x42ccef lbfgs::minimize() @ 0x40b539 deconv()h @ 0x42543c DeepAnalogy::ComputeAnn() @ 0x42108f main @ 0x7f18ba25ff45 (unknown) @ 0x4075c9 (unknown) @ (nil) (unknown) using the prameters like this： ./demo deep_image_analogy/models/ deep_image_analogy/demo/img_analogy/content/246_2.png deep_image_analogy/demo/img_analogy/align/5467_00000.png deep_image_analogy/demo/output/ 0 0.9 2 1` but ratio .5 is ok

the memory is like the following：

+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  Off  | 0000:03:00.0     Off |                  N/A |
| 45%   84C    P2   182W / 250W |   5804MiB /  6142MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     10464    C   python                                        1009MiB |
|    0     10646    C   ./demo                                        3768MiB |

why the model take so much memory？

rozentill commented 7 years ago

Actually the feature maps and the VGG model take much memory.

scholtes commented 7 years ago

@jpcreamer what changes did you have to make in VS to get the deep_image_analogy project to build without CUDA? I was successfully able to build Caffe with CPU only via CommonSettings.props (and also in the one coped to deep_image_analogy), but the build trace still shows that deep_image_analogy is still trying to use nvcc to perform the build.

vinyvince commented 6 years ago

Hi everyone

Im so much fascinated by this technology, after 22 years in computer graphics for films and TV, i feel as motived as in my early start :)

Now i don't have enough knowledge and experience, maybe neither the right side of the brain too :) to fully be confortable enought with all of this to build a new version of the exe which will allow me to raise the resolution, to be able to choose it by a parameters ideally... For now, i'm on a modern 8 core fast CPu, 64gig ram which i may upgrade to 64 and a 11G GPU Nvidia GTX 1080x GTI. I was wondering if a gentleman or a smart lady would be gracefull enough to help me and compile and build a new exe which could bypass this very low resolution..

I would be immensively thanksfull if by chance someone could do this for me and i imagine for the joy of all Github's community

Many thanks

vincent bout.de.lune@gmail.com http://fr.linkedin.com/in/vincentthomas

msracver / Deep-Image-Analogy

Any ways to increase resolution? #6