lisa-lab / pylearn2

Warning: This project does not have any current developer. See bellow.
BSD 3-Clause "New" or "Revised" License
2.75k stars 1.09k forks source link

Update/wrap to cuda-convnet2 #1044

Open nouiz opened 10 years ago

nouiz commented 10 years ago

Time to upgrade pylearn2's warp of cuda-convnet: https://code.google.com/p/cuda-convnet2/ https://plus.google.com/u/0/+AlexKrizhevsky/posts/GeGh4j7kDcR

Need to check for the license, it is apache. Also will probably need to select the old or new version depending of the user GPU, as the new one don't do this. Or at least, test if the new one work and isn't slower on older GTX580

nouiz commented 10 years ago

Other info by @memimo on pylearn-dev:

I'm not sure what you mean by requestion temp memory. But, yes it still use B01C order

I think this is still the best option for pylearn2 for following reasons:

-Its interface hasn't change much, so we can update our wrap with the least amount of effort -It's not just the conv2D that we care about. cuda-convnet has code for optimized pooling too -It's now optimized for the Titan black and k20 that we use at LISA -It's support multiple-gpu architecture -The only other library that meet our needs would be cafe, which apparently its license is not that flexible.

benanne commented 10 years ago

This is probably worth keeping an eye on: https://github.com/soumith/convnet-benchmarks Not too many results there yet, but it should be cool to have some raw performance numbers. I hope raw performance will also factor into the decision on what implementation(s) to wrap :)

benanne commented 10 years ago

Soumith just posted some results for a single convolutional layer (see README in his repo). Looks like this is definitely going to be worth the effort :)

dwf commented 10 years ago

If it's Apache-licensed then we cannot include it in pylearn2 directly. The Apache license is incompatible with BSD, and we need to stay BSD for a variety of reasons. We'll have to refactor around allowing different convolution "plugins".

I think the Theano ops are the right layer at which to do this.

On Thu, Jul 24, 2014 at 7:43 PM, Frédéric Bastien notifications@github.com wrote:

Time to upgrade pylearn2's warp of cuda-convnet: https://code.google.com/p/cuda-convnet2/ https://plus.google.com/u/0/+AlexKrizhevsky/posts/GeGh4j7kDcR

Need to check for the license, it is apache. Also will probably need to select the old or new version depending of the user GPU, as the new one don't do this. Or at least, test if the new one work and isn't slower on older GTX580

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1044.

madisonmay commented 10 years ago

This stack exchange post seems to suggest that you should be fine including Apache licensed code in a BSD licensed project, provided that you also include the Apache license file with the module that was released with cuda_convnet2: http://programmers.stackexchange.com/questions/40561/is-bsd-license-compatible-with-apache. Here's the relevant bit of the Apache license: http://www.apache.org/licenses/LICENSE-2.0.html#redistribution. Don't know if the reasons that pylearn2 must stay BSD prohibit that format, though.

bhack commented 10 years ago

Take a look here

nouiz commented 10 years ago

Hi,

the apache license have much more restriction on the users the bsd. For example, if you use the apache code, you agree to don't sue for copyright the author of that code. Mixing part of pylern2 or theano code with code with such restriction make user enter in a strange area: if have that code installed, but manually disabled its usage, what happen? I'm not a lawyer but I saw many compagnies having questions about the restriction they have related to the bsd license. If we start to mix license, it will make it even harder for them to know what they agree to use and don't have problem.

I think we could look into making a separate repo, but add in the setup.py stuff, such that if it isn't installed prompt the user if he want to install it. That way they have a clear decision of what they want to do.

Anyway, in all case, we first need someone to do the wrapper. We can always later change the place it is included or the new repo. But I don't think we will include it directly in Pylearn2. Atleast, not before much more verification, which I won't do shortly.

Fred

On Fri, Aug 8, 2014 at 5:58 AM, bhack notifications@github.com wrote:

Take a look here http://www.apache.org/legal/resolved.html#category-a

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1044#issuecomment-51583439.

madisonmay commented 10 years ago

Is there anyone actively working on this port? I'd be very interested in moving forward on this issue technically, even if there are licensing constraints that we'd have to consider later on when integrating with pylearn2. The support offered for multiple gpus would be an excellent value add for pylearn2.

This years ILSVRC competition featured VGG's convnet trained for ~4-6 weeks on 4 gpus. On a single gpu that kind of computation would be infeasible, and it would be great to have pylearn2 help facilitate research at that scale.

I understand that @goodfeli and @dwf were responsible for the original wrapper around cuda-convnet for pylearn, and would be curious to hear what your estimates would be for a port of Krizhevsky's cuda-convnet2 library. A cursory comparison of cuda-convnet2 makes it seem like the high-level interface to the library has stayed very similar, so I would anticipate a port being pretty feasible with a few weekends worth of dedicated work. I'd also appreciate a quick assessment of whether or not the Krizhevsky's hybrid data/model parallelism method would play well with Theano -- if not, pure data parallelism might provide most of the benefit with a smaller amount of effort.

Even if multiple-gpu support may require a longer term effort to port, the improved train times on Kepler gpus would still be a nice value add.

goodfeli commented 10 years ago

As long as the interface is indeed similar, you're right that it should only be a few weekends of work to copy-paste our wrapper and make it work with the new library. theano-dev would be a better place to ask about this stuff, especially multi-GPU support. The cuda-convnet wrappers really should not be in pylearn2, I just put them there because theano_linear was there. In my opinion all of pylearn2.linear should be in theano, but I think that's still controversial. I think it's less controversial that the ops underlying it should be in theano. Side comment: there's no 'c' in Krizhevsky.

2014-09-07 22:34 GMT-07:00 Madison May notifications@github.com:

Is there anyone actively working on this port? I'd be very interested in moving forward on this issue technically, even if there are licensing constraints that we'd have to consider later on when integrating with pylearn2. The support offered for multiple gpus would be an excellent value add for pylearn2.

This years ILSVRC competition featured VGG's convnet trained for ~4-6 weeks on 4 gpus. On a single gpu that kind of computation would be infeasible, and it would be great to have pylearn2 help facilitate research at that scale.

I understand that @goodfeli https://github.com/goodfeli and @dwf https://github.com/dwf were responsible for the original wrapper around cuda-convnet for pylearn, and would be curious to hear what your estimates would be for a port of Krizchevsky's cuda-convnet2 library. A cursory comparison of cuda-convnet2 makes it seem like the high-level interface to the library has stayed very similar, so I would anticipate a port being pretty feasible with a few weekends worth of dedicated work. I'd also appreciate a quick assessment of whether or not the Krizchevsky's hybrid data/model parallelism method http://arxiv.org/pdf/1404.5997v2.pdf would play well with Theano -- if not, pure data parallelism might provide most of the benefit with a smaller amount of effort.

Even if multiple-gpu support may require a longer term effort to port, the improved train times on Kepler gpus would still be a nice value add.

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1044#issuecomment-54778171.

nouiz commented 10 years ago

the cuda-convnet2 can't be put in Theano or Pylearn2 due to license. It should be in a separate repo.

Moving the cuda-convnet to Theano make sence, but it probably will become useless (Nvidia just release its own library with convolution) so I don't think it is worth the time to move them. Just let them there for history. With NVIDIA release that we will probably finish wrapping the convolution code this week in Theano, I don't think it will be wise to spend time on cuda-convnet2 unless we see a clear reason. For now, I don't see one. I'm pretty sure the multi-gpu can be done with the nvidia lib. Maybe we need to do it manually, but I think it can be done. @abergeron do you have the same impression?

For the multi-GPU, we should talk about that in theano-dev. We have short term plan to finish that in Theano, but the "short" term seem to always take longer. A "very short" term would mean making just a convolution op be multi-gpu and not everything. If you are interrested to help/continue this discussion, start a new thread on theano-dev.

Fred

On Mon, Sep 8, 2014 at 9:55 AM, Ian Goodfellow notifications@github.com wrote:

As long as the interface is indeed similar, you're right that it should only be a few weekends of work to copy-paste our wrapper and make it work with the new library. theano-dev would be a better place to ask about this stuff, especially multi-GPU support. The cuda-convnet wrappers really should not be in pylearn2, I just put them there because theano_linear was there. In my opinion all of pylearn2.linear should be in theano, but I think that's still controversial. I think it's less controversial that the ops underlying it should be in theano. Side comment: there's no 'c' in Krizhevsky.

2014-09-07 22:34 GMT-07:00 Madison May notifications@github.com:

Is there anyone actively working on this port? I'd be very interested in moving forward on this issue technically, even if there are licensing constraints that we'd have to consider later on when integrating with pylearn2. The support offered for multiple gpus would be an excellent value add for pylearn2.

This years ILSVRC competition featured VGG's convnet trained for ~4-6 weeks on 4 gpus. On a single gpu that kind of computation would be infeasible, and it would be great to have pylearn2 help facilitate research at that scale.

I understand that @goodfeli https://github.com/goodfeli and @dwf https://github.com/dwf were responsible for the original wrapper around cuda-convnet for pylearn, and would be curious to hear what your estimates would be for a port of Krizchevsky's cuda-convnet2 library. A cursory comparison of cuda-convnet2 makes it seem like the high-level interface to the library has stayed very similar, so I would anticipate a port being pretty feasible with a few weekends worth of dedicated work. I'd also appreciate a quick assessment of whether or not the Krizchevsky's hybrid data/model parallelism method http://arxiv.org/pdf/1404.5997v2.pdf would play well with Theano -- if not, pure data parallelism might provide most of the benefit with a smaller amount of effort.

Even if multiple-gpu support may require a longer term effort to port, the improved train times on Kepler gpus would still be a nice value add.

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1044#issuecomment-54778171.

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1044#issuecomment-54820408.

benanne commented 10 years ago

Has CuDNN been compared against cuda-convnet2? I found it odd that the blog post about CuDNN made no mention of it. Soumith's benchmarks seem to indicate that cuda-convnet2 beats the Caffe gemm approach for a few configurations ( https://github.com/soumith/convnet-benchmarks ). Since CuDNN is supposedly only 1.2x - 1.3x faster than Caffe, it might still be beneficial to use cuda-convnet2 for certain configurations.

It might not be worth the effort though... perhaps it would be a good idea to wait with that decision until CuDNN support is implemented so it can be included in the benchmarks. If cuda-convnet2 still turns out to have an edge for some input configurations, a more informed decision can be made.

nouiz commented 10 years ago

My guess is that cudnn will get updated until it always bet cuda-convnet2. But the futur is not sure! So if someone want to work on that, do it. I won't discourage people of doing that. I just don't want people to have wrong expectation.

I agree, it would be good to have it in the benchmark to know the current speed.

On Mon, Sep 8, 2014 at 10:17 AM, Sander Dieleman notifications@github.com wrote:

Has CuDNN been compared against cuda-convnet2? I found it odd that the blog post about CuDNN made no mention of it. Soumith's benchmarks seem to indicate that cuda-convnet2 beats the Caffe gemm approach for a few configurations ( https://github.com/soumith/convnet-benchmarks ). Since CuDNN is supposedly only 1.2x - 1.3x faster than Caffe, it might still be beneficial to use cuda-convnet2 for certain configurations.

It might not be worth the effort though... perhaps it would be a good idea to wait with that decision until CuDNN support is implemented so it can be included in the benchmarks. If cuda-convnet2 still turns out to have an edge for some input configurations, a more informed decision can be made.

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1044#issuecomment-54824563.

madisonmay commented 10 years ago

@goodfeli, thanks for the analysis. It seems like the general consensus is that any sort of integration should be addressed at the Theano level rather than the pylearn2 level, so I will gladly move that discussion to the theano-dev mailing list. And thanks for the correction w/ regards to Krizhevsky.

@nouiz, it looks like caffe's integration of cuDNN required many thousand lines of code, so I'm not sure how short-term that project will be. I'd like to stay up to date on that progress, though. I was unable to find an open issue / PR about multi-gpu support on the theano github page -- if that does exist, do you think you could drop in a link to that?

I'm of the opinion that it would still be worth pursuing the cuda-convnet2 integration in parallel, since as @benanne mentions it's unlikely that the difference in performance between the two will be too substantial.

nouiz commented 10 years ago

There is no ticket about cudnn. I just created one:

https://github.com/Theano/Theano/issues/2094

Last Friday, @abergeron finished the first version of our wrapping of there convolution code. This is what could give the most speed up. I think we can have that merged in Theano this week.

On Mon, Sep 8, 2014 at 11:42 AM, Madison May notifications@github.com wrote:

@goodfeli https://github.com/goodfeli, thanks for the analysis. It seems like the general consensus is that any sort of integration should be addressed at the Theano level rather than the pylearn2 level, so I will gladly move that discussion to the theano-dev mailing list. And thanks for the correction w/ regards to Krizhevsky.

@nouiz https://github.com/nouiz, it looks like caffe's integration of cuDNN https://github.com/BVLC/caffe/pull/1046/files required many thousand lines of code, so I'm not sure how short-term that project will be. I'd like to stay up to date on that progress, though. I was unable to find an open issue / PR on the theano github page -- if that does exist, do you think you could drop in a link to that?

I'm of the opinion that it would still be worth pursuing the cuda-convnet2 integration in parallel, since as @benanne https://github.com/benanne mentions it's unlikely that the difference in performance between the two will be too substantial.

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1044#issuecomment-54840016.

madisonmay commented 10 years ago

Yeah, the estimate in Krizhevsky's paper was that ~90% of the speedup from multi-gpu support could be achieved by supporting data parallelism in the conv layers. Thanks for creating that ticket.

benanne commented 9 years ago

Bump :) Is this still being considered? Soumith's latest benchmarks ( https://github.com/soumith/convnet-benchmarks ) show that cuda-convnet2 is pretty competitive for some configurations, even compared to cudnn R2.

I am still using the cuda-convnet wrappers a lot, because even on the GTX 980, I can still get substantial speedups from them compared to all the other convolution implementations that are now available in Theano. So I imagine cuda-convnet2 would probably be even faster for my use cases.

I'm willing to help with this if I can be of any use, but someone else would need to take the lead as I'm not comfortable at all with C/C++.