hughperkins / EasyCL

Easy to run kernels using OpenCL
Other
183 stars 52 forks source link

adding fpga support #21

Open tonyzhenyuxu opened 7 years ago

tonyzhenyuxu commented 7 years ago

Would you please give me your comments on supporting OpenCL running on FPGA device instead of GPU such as Altera Arria 10? Thanks.

hughperkins commented 7 years ago

Typically FPGAs work slightly differently from discrete GPUs, in that the programming time is very long, hours.

For a discrete GPU, such as AMD, or NVIDIA, the workflow for OpenCL looks something like:

  1. program starts
  2. program initializes GPU device
  3. program loads OpenCL source-code, which looks like C code basically
  4. program gives OpenCL code to GPU driver, which compiles it to GPU object code, passes it to the GPU, and passes a handle back to the program
  5. program gives program handle to the GPU driver, along with some data, and GPU starts processing the data, following the logic in the program

For a discrete GPU, steps 1 to 3 take ~seconds. Step 4 takes ~seconds, or less. Step 5 takes as long as it takes. minutes/hours/days/weeks, depending on what you're doing/training.

For an FPGA, step 4, takes significantly longer. Hours instead of seconds. So, the workflow would be quite different. The compilation of the OpenCL has to happen offline essentially, rather than at runtime.

It's probably not a massively blocking change, but it would need rethinking somewhat how the program runs. For example, EasyCL currently assumes that hte OpenCL will be compiled at runtime. You'd need to partition EasyCL into two parts:

hughperkins commented 7 years ago

(But note that I have zero experience with FPGAs, so I dont really know. You should check how compilation on an FPGA works for yourself)

tonyzhenyuxu commented 7 years ago

you are absolutely right. I have to compile kernel code (openCL) offline. It takes from 10 to 15 hours usually. I have been reading the source code and comparing with Altera OpenCL examples. as you said, I don't think there is a massive code change, but, indeed, I need to partition code.

Thanks.

On Sat, Dec 3, 2016 at 2:08 AM, Hugh Perkins notifications@github.com wrote:

(But note that I have zero experience with FPGAs, so I dont really know. You should check how compilation on an FPGA works for yourself)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264630072, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfnuVACxbJsKMX6aXH5EL0t0Hgrvtuks5rET-dgaJpZM4LC7t3 .

hughperkins commented 7 years ago

Cool :-)

tonyzhenyuxu commented 7 years ago

I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?

Thanks.

On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins notifications@github.com wrote:

Cool :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3 .

tonyzhenyuxu commented 7 years ago

Can you help me to understand the following code: in the LayerDimensions.cpp file and deriveOthers function

we have the following line (which puzzles me). this->outputSize = padZeros? (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : (inputSize - filterSize)/ (skip+1) + 1;

I am wondering if (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : could be: (filterSize % 2 == 0? (inputSize-filterSize)/(skip+1) + 1 : (inputSize-filterSize)/(skip +1)) :

It seems to me skip+1 = stride, is it right? There is not much explanation on dimension in the code. would you mind to spare a few minutes on this.

Appreciated.

-T

On Thu, Dec 8, 2016 at 4:16 PM, tzxu . tony.z.xu@gmail.com wrote:

I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?

Thanks.

On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins notifications@github.com wrote:

Cool :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3 .

hughperkins commented 7 years ago

Yes, skip + 1 is stride. 'Skip' is from a relatively old paper. 'Stride' is the common notation nowadays.

On 10 December 2016 01:28:39 CET, tonyzhenyuxu notifications@github.com wrote:

Can you help me to understand the following code: in the LayerDimensions.cpp file and deriveOthers function

we have the following line (which puzzles me). this->outputSize = padZeros? (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : (inputSize - filterSize)/ (skip+1) + 1;

I am wondering if (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : could be: (filterSize % 2 == 0? (inputSize-filterSize)/(skip+1) + 1 : (inputSize-filterSize)/(skip +1)) :

It seems to me skip+1 = stride, is it right? There is not much explanation on dimension in the code. would you mind to spare a few minutes on this.

Appreciated.

-T

On Thu, Dec 8, 2016 at 4:16 PM, tzxu . tony.z.xu@gmail.com wrote:

I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?

Thanks.

On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins notifications@github.com wrote:

Cool :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub

https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017, or mute the thread

https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3 .

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/EasyCL/issues/21#issuecomment-266160565

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

lihao2333 commented 6 years ago

Hi, I also want to improve this code to support fpga.
I use tesrasic c5p, which chip is altera cyclone V
I could run some simple demo, such as hello world, vector_add, but when i call clinfo, it reports

root@up2:~# clinfo
I: ICD loader reports no usable platforms

Is there some hope? Thanks very much

hughperkins commented 6 years ago

Fpgas need to be compiled offline, have the kernels burned onto the fpga. This can take several hours. Then, once they are burned in, you can run them.

You would need to modify your code to support two stages like this. And easycl too.

On Fri, Jun 15, 2018, 09:48 李昊 notifications@github.com wrote:

Hi, I also want to improve this code to support fpga. I use tesrasic c5p, which chip is altera cyclone V I could run some simple demo, such as hello world, vector_add, but when i call clinfo, it reports

root@up2:~# clinfo I: ICD loader reports no usable platforms

Is there some hope? Thanks very much

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-397625554, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHiqJ0iHEwD79Os75orj9lzxOoQY40Tks5t87sxgaJpZM4LC7t3 .

hughperkins commented 6 years ago

@lihao2333

oh. re-reading, now I have access to a web browser, not just replying to an email; ok, right, you would need to find an opencl driver, and icd registration, for your fpga. You could for example ask the customer support for your fpga, or search in their forums perhaps.