jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Pascal GPU 8GB limit #221

Open 3DTOPO opened 8 years ago

3DTOPO commented 8 years ago

Well, its looking more and more likely that new consumer Pascal GPUs will max out at 8GB. That is really disappointing - I was hoping for more VRAM if anything not less (have 12GB in Titan of course).

;(

Is there any chance of combining cards with the new NVIDIA link for neural-style? Or any ideas how to better tile? I know that there is an issue with overlapping tiled renders as is. I guess it would be really nice to only load the model once and render as many tiles needed.

Thoughts?

jcjohnson commented 8 years ago

I think NV-link would be nontrivial to support.

However don't forget that Pascal will also support 16-bit floating point, so with 8GB of memory you will actually be able to run bigger images than with 12GB of 32-bit floating point. I'll absolutely add support for this.

I'm also holding out hope for a 16GB Pascal Titan card sometime next year; so far all the rumors have focused around the 1080 and 1070.

3DTOPO commented 8 years ago

Awesome, I did not realize that, thank you!

Yeah - looks like 5 models have surfaced now: http://vrworld.com/2016/05/03/nvidia-pascal-secret-hidden-directory/

3DTOPO commented 8 years ago

In case others haven't seen today's announcement: http://www.geforce.com/hardware/10series/geforce-gtx-1080

ghost commented 8 years ago

@jcjohnson For NVLink support, we should at least wait for CUDA 8 to be released this summer.

austingg commented 8 years ago

@jcjohnson have this implementation support cudnn v5? The Wingrad algorithm will benefit vgg-like model a lot.

3DTOPO commented 8 years ago

NVIDIA just released CUDA 8 release candidate. It states: "Simplify programming using Unified memory on Pascal including support for large datasets, concurrent data access and atomics". Not sure exactly what "simplify" means but sounds nifty!

And the new 1080 GTX cards go on sale in less than an hour now! :)

jcjohnson commented 8 years ago

@austingg cudnn v5 should "just work" with neural-style as long as you've installed the R5 version of the cuDNN Torch bindings.

@3DTOPO Good luck getting one of the 1080s - they are selling like crazy and pretty much impossible to find today. I didn't get one =(

3DTOPO commented 8 years ago

My bad - they aren't officially available until 6PM pacific time today (I was thinking eastern). Try at that time: http://www.geforce.com/hardware/10series/geforce-gtx-1080

jcjohnson commented 8 years ago

They started going on sale at 6am pacific time - people on /r/nvidia were reporting that the nvidia website was sold out by 6:05am.

3DTOPO commented 8 years ago

DOH!

3DTOPO commented 8 years ago

They are available now on Amazon - but for a very greedy markup! ~$1029 - $1200

jcjohnson commented 8 years ago

A few weeks from now they should be easy to find. Although honestly I'm not sure that I want a 1080 for my personal machine - I have a Titan X and a Titan Z already; I don't want to give up the Titan X since it has more memory than the 1080, and the Z is two slow cards on one PCI slot which is sometimes more useful than a single fast card. First world problems, but I might wait for the 1080 Ti or next-gen Titan for my personal machine.

3DTOPO commented 8 years ago

Interesting. The memory issue shouldn't affect neural-style though as you stated it will have more memory with the 16-bit floats, correct?

jcjohnson commented 8 years ago

True, but I use my GPUs for a lot more than neural-style; for a lot of stuff I'll still want 32-bit floats.

On Fri, May 27, 2016 at 3:30 PM, Jeshua Lacock notifications@github.com wrote:

Interesting. The memory issue shouldn't affect neural-style though as you stated it will have more memory with the 16-bit floats, correct?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jcjohnson/neural-style/issues/221#issuecomment-222267321, or mute the thread https://github.com/notifications/unsubscribe/ACl7-vd7ZbKzxu0ccgi8_70NZHvUA790ks5qF3CcgaJpZM4IXtdi .

3DTOPO commented 8 years ago

Yeah - thats what I figured. Well - I only use my Titan X for neural style. Want to buy it? ;)

3DTOPO commented 8 years ago

@jcjohnson I just ordered one from Amazon. They state it will be in stock and shipping on the 31st (4 days): http://www.amazon.com/PNY-GeForce-GTX-1080-VCGGTX10808PB-CG/dp/B01G39W58G?ie=UTF8&psc=1&redirect=true&ref_=ox_sc_act_title_1&smid=ATVPDKIKX0DER

jcjohnson commented 8 years ago

Seems like the 1080 does not have fast 16-bit floating point, and this is a premium feature that is only available on the P100 for now:

https://www.reddit.com/r/MachineLearning/comments/4lhrfj/fp16_performance_on_gtx_1080_is_artificially/

That's pretty disappointing =(

3DTOPO commented 8 years ago

Uh-oh. So does that mean it has 16-bit float support but at a performance penalty? Or does that mean we'll have less ram (only 8GB) for neural-style on the 1080?. If its the latter I think I better cancel my purchase!

jcjohnson commented 8 years ago

The 1080 has 16-bit support but there is a huge performance penalty. On the Tesla P100 found in the DGX-1, 16-bit floating point is 2x the speed of 32-bit floating point; on the GTX 1080, 16-bit floating point is 1/64 the speed of 32-bit floating point. That is so slow that it is basically not usable. The 1080 will be quite a bit faster than the Titan X at 32-bit floating point, but if you were mainly interested in bigger images by using 16-bit floating point then you are out of luck =(

The GTX 1080 and GTX 1070 are based on the GP104 GPU, which I guess doesn't have the same FP16 support as the GP100 GPU that the Tesla P100 uses. It's possible that there will eventually be consumer cards (1080 Ti or a new Titan) that use the GP100 GPU, and we can hope that they will also have fast FP16 support.

3DTOPO commented 8 years ago

Ack! Thanks for the info. Well I could live with the same amount of RAM as the Titan - but not 50% less!

Super sad news indeed! Though better I found out before its too late. Guess I will cancel the order.

This puts a huge unknown in my plans. I held the launch of an app for these new cards, now it is looking like I will have to delay it indefinitely. :'(

ghost commented 8 years ago

@jcjohnson According to chinese leaks, the next titan is definitely based upon the GP100 with ... (buckles up) 24GB !

3DTOPO commented 8 years ago

@djebm2 nothing substantiated yet - and even if true - the rumor states them shipping in the end of the year at the soonest.

3DTOPO commented 8 years ago

Alright! The new Titan X has been announced with 12GB on board!

http://www.geforce.com/hardware/10series/titan-x-pascal

ghost commented 8 years ago

@3DTOPO The titan has a new type of deep learning intructions based on int8 data type. It should perform at 44 TOPS according to NVIDIA. So normally, it like doing training on a 4 times bigger memory. Don't know about the precision related stuff though.

3DTOPO commented 8 years ago

However don't forget that Pascal will also support 16-bit floating point, so with 8GB of memory you will actually be able to run bigger images than with 12GB of 32-bit floating point. I'll absolutely add support for this.

I just ordered a new Titan X. Will this be possible with it? If so, any idea when you might implement it?

ghost commented 8 years ago

@jcjohnson Can you also use the brand new unified memory feature that pascal offers ? With this, you can cudaMallocManage up to the overall system available... YES, the RAM + the GDDR5X.

michaelhuang74 commented 6 years ago

@jcjohnson Any suggestion how to modify the code to support 16-bit or 8-bit floating point operation? Thanks.

htoyryla commented 6 years ago

Are there any reasonably priced GPU cards available that support FP16 without the performance limit?

As far as I know torch supports HalfTensors, so one could try to change the data types in neural_style.lua. But if FP16 runs at 1/128th (or 1/64th) of the normal FP speed there is not really much point to do so.

michaelhuang74 commented 6 years ago

@htoyryla Titan Xp seems to have very poor support of FP16. I am targeting Tesla P100, whose FP16 performance is 2 times of FP32 performance.

htoyryla commented 6 years ago

As torch now supports FP16 as CudaHalfTensors, if I hade a GPU with a reasonable FP6 speed, I would just start to change the relevant variables to CudaHalfTensor, paying attention to potential problems on the way. Justin does not appear to be active with this code any longer, there are some like me who do their own modifications now and then, but using consumer level GPUs.