feature: automatically switch b/w GPU and CPU for performance

lmjohns3 / theanets

Neural network toolkit for Python

http://theanets.rtfd.org

MIT License

328 stars 73 forks source link

feature: automatically switch b/w GPU and CPU for performance #96

Closed majidaldo closed 9 years ago

majidaldo commented 9 years ago

small problem sizes don't necessarily benefit from gpu computation. would it be easy to add a few lines that check how performance on each processor is and then just switch to using that?

it could be more granular too as sample sizes can vary in a training session.

abramhindle commented 9 years ago

I'm not the maintainer but in my opinion it's hard to predict this and it depends on the hardware in use. Furthermore converting between CPU and CUDA theano objects is non-trivial. So to do this right you'd need an idea of the memory on the GPU, the size of the network and your batch_size and the size of the input data. I think this is a manual optimization best left to the programmer. Alternatively having a function to help profile and search for a solution might be best.

lmjohns3 commented 9 years ago

I agree, this would be quite difficult in practice. For the CPU/GPU abstraction theanets relies on features provided by Theano. (One of the big benefits of Theano is that it provides just this abstraction, so that theanets mostly doesn't have to be aware of it.)

You can always profile your theanets program easily by running it in Theano's profile mode:

THEANO_FLAGS=profile=True my_script.py

This will print out a large amount of profiling information that you can use to determine which parts of the computation graph are taking up the most time.

majidaldo commented 9 years ago

mmm yea i'm thinking if you had to do this you would have two theanets instances as services, one cpu and the other gpu, that would get training samples from a dispatcher that would know how to best distribute the work.

majidaldo commented 9 years ago

also too bad nvidia stopped supporting the ability to run CUDA objects on CPUs.

lmjohns3 commented 9 years ago

I'm going to go ahead and close this; doesn't seem feasible to do.