hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
866 stars 200 forks source link

DeepCL and optical flow computation #52

Closed Pax1601 closed 8 years ago

Pax1601 commented 8 years ago

Hi there,

I'm sorry if this is not the best way to contact you, I'm not really pointing an issue, but rather I'd like to ask a question. Do you believe that DeepCL could be suitable to estimate optical flows from video frames, as available, for example, in http://damienteney.info/cnnFlow.htm. This project is very good but it is based on MatConvNet which only supports CUDA computations, while I need an OpenCL based solution.

Thank you for your time.

Best regards,

Davide

hughperkins commented 8 years ago

You would need 3-d convolutions I think? Current convolutions are 2-d, over images. But I think you'd be convolving over time too, is that right?

I think this will be quite hard to shoe-horn into deepcl, which was originally intended to have a very strictly limited scope of handling Go-boards, and as such handles only square images. It could be upgraded to handle non-square videos, but would be a fair amount of work.

If I was in your position, I might plausibly look at porting across the similar layers in cuda torch into cl torch. That should be fairly straightforward to do. I can probably actually handle that if you are interested? cl torch is at https://github.com/hughperkins/clnn Let me know if this could be interesting to you.

Pax1601 commented 8 years ago

Indeed it requires 3-d convolutions, but as you can read in the webpage I have linked above, it also provides a Caffe model where 3 subsequent grayscale frames are merged in a single RGB frame. Do you think such model may work?

My problem is portability. The final program must run on a drone, and right now I'm not sure about the architecture of the GPU. The only information I have is that it should run OpenCL. I'd like to be more precise, but I'm asking you this question because I will require optical flow computation for a university project and I still don't exactly know what machine the code will run on.

hughperkins commented 8 years ago

2d convolution takes a stack of 2d images, and convolves them together, using an arbitrary number of filters, to give a number of output 2d images equal to the number of filters. Each filter is 3d: taking a stack of 2d images.

It sounds like your model will have 3 incoming image planes, is that right? In which case, it's just a standard convolution. Actually, when I say '2d', each image is 2d, but the convolution filters are 3d: taking a stack of input images. Its plausible that for video, one would actually need stacks of 4d filters actually, not 3d as I implied earlier.

DeepCL has the following requirements to run: