Enable Large Memory/Long-Running Execution Support with OpenCL

GoogleCodeExporter commented 8 years ago

First, one of the most irritating things about OpenCL is that the default 
"Maximum Memory Allocation" is significantly less than the entire amount of 
available GPU RAM. In the case of our AMD hardware, it is usually 1/3 of the 
available global GPU RAM.

So, I'd like to start a ticket/thread about fixing that issue or at least 
investigating how to fix it.

Here is what we've discovered so far using a number of AMD Radeon and FirePro 
GPUs:

1) AMD OpenCL drivers default to 32-bit addressing even on 64-bit operating 
systems and hardware (tested on Windows 7 64-bit). In order to overcome that 
limitation, you need to set an undocumented environment variable:
  - GPU_FORCE_64BIT_PTR to a value of 1

2) Attempting to expose the entire amount of GPU memory has been an 
unsuccessful struggle. We've attempted to set the following:
  - GPU_MAX_HEAP_SIZE to a value of 100 (%) and 6144 (MB) with no change
  - GPU_MAX_ALLOC_SIZE to a value of 100 with no change
  - GPU_MAX_ALLOC_PERCENT to a value of 100 with no change

Clinfo and OpenCL via Aparapi both report the default %33 memory allocation.

See:

(Search for GPU_MAX_HEAP_SIZE)
http://developer.amd.com/resources/documentation-articles/knowledge-base/?ID=123

http://devgurus.amd.com/thread/145052
http://www.geeks3d.com/forums/index.php?topic=1528.0

AMD Large Buffer Support work-around (interesting):
http://devgurus.amd.com/message/1282921#1282921

Second, if you happen to have an OpenCL/GPU process which executes for longer 
than 2 seconds, Windows 7 will kill it. So, you need to tell Windows to leave 
your OpenCL/GPU execution alone. Here is how to do that:

http://www.cmsoft.com.br/index.php?option=com_content&view=category&layout=blog&
id=58&Itemid=95

Quote:

1 - Click Start and execute regedit.exe (type "regedit" as a command to open 
the Registry Editor);

2 - Go to HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTROL\GraphicsDrivers;

3 - Create two REG_DWORD: TdrDelay and TdrDdiDelay;

4 - Set TdrDelay to the maximum time in seconds you will allow the GPU to run 
codes (I suggest 128) and TdrDdiDelay to the time it takes for Windows to 
reboot your GPUs if they don't respond (I suggest 256);

5 - Go to entry HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTROL\SESSION 
MANAGER\ENVIRONMENT and create a variable REG_SZ called GPU_MAX_HEAP_SIZE. Set 
its value to something greater than or equal to your GPU memory (I suggest 
1024);

6 - Alternatively, you can create the environment variable by going to Control 
Panel -> System - > Advanced system settings -> Tab Advanced -> Environment 
Variables and creating GPU_MAX_HEAP_SIZE there.

Original issue reported on code.google.com by ryan.lam...@gmail.com on 12 Mar 2013 at 4:53

GoogleCodeExporter commented 8 years ago

Let me update this post, setting GPU_MAX_ALLOC_PERCENT = 100 did in-fact change 
clinfo output to report almost exactly %100 usage of the global GPU RAM. I will 
test this in Aparapi now...

Original comment by ryan.lam...@gmail.com on 12 Mar 2013 at 4:55

GoogleCodeExporter commented 8 years ago

Here is the behavior that I am seeing on my AMD W9000 (log output from my app) 
when comment #1 is applied:

OpenCL globalMemSize: 6.4 GB
OpenCL maxMemAllocSize: 6.2 GB
SubTotalMemSize: 4.8 GB (the size we're trying to allocate)
Range: 2D(global:4096x4096 local:(derived)16x16)

...

Mar 12, 2013 12:39:59 PM com.amd.aparapi.internal.kernel.KernelRunner 
executeOpenCL
WARNING: ### CL exec seems to have failed. Trying to revert to Java ###

Original comment by ryan.lam...@gmail.com on 12 Mar 2013 at 7:46

GoogleCodeExporter commented 8 years ago

Ryan, my understanding (and I can't recall where I heard this) is that you 
can't get access to more than 50% of the 'available' memory. Does this gel with 
your experiments?

Original comment by frost.g...@gmail.com on 12 Mar 2013 at 7:48

GoogleCodeExporter commented 8 years ago

Yes, it seems to be a rather arbitrary, vendor dependent, ceiling. What we 
struggle with is understanding why that is a limitation at all and if there is 
some way to either programatically or otherwise (env variables) resolve the 
issue.

It is practically inexcusable to suggest customers buy very expensive hardware, 
such as a 6GB DDR5 AMD W9000 and then see that only 2GB of the RAM can be used 
at any one time.

It seems OpenCL has an arbitrary limitation set on individual memory 
allocations and it appears that there is a lot of confusion in various places 
as to whether this is an actual limitation or not and how to work-around or 
deal with it.

Original comment by ryan.lam...@gmail.com on 12 Mar 2013 at 7:56

GoogleCodeExporter commented 8 years ago

Original comment by ryan.lam...@gmail.com on 12 Mar 2013 at 7:57

Changed state: New

GoogleCodeExporter commented 8 years ago

This has also been the major impetus to continue looking for "streaming" or 
"overlapping compute" solutions with Aparapi.

Original comment by ryan.lam...@gmail.com on 12 Mar 2013 at 8:00

yaohuaxin / aparapi

Enable Large Memory/Long-Running Execution Support with OpenCL #98