uygn / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

RFE: explicit get/put range #116

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I am often working with a very large array that slowly grows. That is, 
typically only a small portion of it is changing. By transferring only the 
changing region of the array, I saw a significant speed-up.

Ideally, there would be a set of APIs on the kernel to get/put a range of an 
array. Something like putRegion(int[] array, int startIndex, int length). The 
corresponding calls to clEnqueueWriteBuffer/clEnqueueReadBuffer would use a 
different byte offset, byte count and host_ptr correspondingly.

In the code I've currently implemented, I extended the "puts" Set in 
KernelRunner to support this, though I don't think this is ideal because it 
only allows for one region per array. From an API standpoint, there shouldn't 
be a limitation with writing multiple regions, but the percolation from 
KernelRunner to KernelArg (java) to KernelArg (cpp) could get messy. I'd be 
happy to help with this.

Original issue reported on code.google.com by paul.mi...@gmail.com on 12 Jun 2013 at 2:00

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
This is a great idea.

When we first started playing with put and get I actually coded a version of 
this, but wanted to allow a simple way of denoting multiple regions. I never 
found an API I liked .

So your proposal is 
kernel.putRegion(buf, 0 ,buf.length/2);

I think we could just overload put()

kernel.put(buf, 0 ,buf.length/2);

For multiple regions we could leverage the fluent style API

kernel.put(buf, 0 ,buf.length/2).put(buf, buf.length-10, 10);

For 'puts' this would work well, because we just cache the intent to put (we 
defer the actual puts until kernel.execute() is called. 

For gets this would be less efficient as we currently fetch each explicitly. 

Unless we allow some form of transaction

kernel.start().put(buf,0,n).put(buf,m,p).execute(range).get(buf,0,n).get(buf,x,y
).end();

Which is verbose but efficient. 

Gary 

Gary 

Original comment by frost.g...@gmail.com on 17 Jun 2013 at 10:14

GoogleCodeExporter commented 9 years ago
So in this, case start() changes the "mode" so to speak of the kernel, so that 
nothing (not even the execute) is actually performed until end() is called? 
Sounds interesting.

Original comment by paul.mi...@gmail.com on 18 Jun 2013 at 4:48