gpu / JOCL

Java bindings for OpenCL
http://www.jocl.org
Other
187 stars 33 forks source link

Verify that CL_MEM_USE_HOST_PTR works for pointers to Java arrays #7

Open gpu opened 8 years ago

gpu commented 8 years ago

According to the OpenCL specification for the clEnqueueMapBuffer function:

If the buffer object is created with CL_MEM_USE_HOST_PTR set in mem_flags, the following will be true:

  • The host_ptr specified in clCreateBuffer is guaranteed to contain the latest bits in the region being mapped when the clEnqueueMapBuffer command has completed.
  • The pointer value returned by clEnqueueMapBuffer will be derived from the host_ptr specified when the buffer object is created.

Although JOCL supports pointers to Java arrays (as far as reasonably possible), it is not clear whether this particular application pattern works as expected.

The goal of this issue is to

blueberry commented 8 years ago

As far as I remember, pointers to Java arrays will work from native code, but it is not guaranteed. It all depends on the JVM implementation and the memory resources available at each particular moment. That also makes it difficult for testing, because it will work - most of the time. But, that can fail anytime, since JVM is free to rearrange Java arrays in memory as it pleases. There are methods to tell it not to do it, but that makes JVM less effective at its job.

All in all, the safest bet is to use DirectByteBuffer.

gpu commented 8 years ago

They will not work directly. You can obtain a pointer to a Java array in JNI, using GetPrimitiveArrayCritical. This is already done in JOCL, for blocking operations. The pseudocode is

void data* = GetPrimitiveArrayCritical(someJavaArray);
clEnqueueWriteBuffer(..., /*blocking=*/ true, ... data);
ReleasePrimitiveArrayCritical(data);

But as the name suggests, this imposes a "critical section". Particularly, it is not possible to do GetPrimitiveArrayCritical in one JNI function, and ReleasePrimitiveArrayCritical in another.

Now, when calling clCreateBuffer with CL_MEM_USE_HOST_PTR, and a pointer to a Java array, even if it fetches the pointer using GetPrimitiveArrayCritical, it will release it after the function call.

And the crucial point is: Nobody knows what exactly the OpenCL implementation is doing with this pointer when CL_MEM_USE_HOST_PTR is given. It's likely that it will store this pointer in some way, to later update it with the data from the cl_mem object. But at this point in time, this pointer is already invalid. This could cause nasty crashes. (In fact, this issue is based on a bug report that I received via mail, and where this might have happened).

Usually, I try to support Java arrays whereever possible, because always doing everything with direct buffers is cumbersome. But I'm afraid that CL_MEM_USE_HOST_PTR will likely be one of the few cases where I'll have to add something like

if (flagsContain(CL_MEM_USE_HOST_PTR) && isNoDirectBufferPointer(pointer)) {
    throw new NoThatIsNotPossibleException("Sorry...");
}

I'm not sure when I'll find the time to investigate this in more detail, but wanted to summarize it here, so that people know that there likely is an issue with this usage pattern, and ... that it does not get lost somewhere at the bottom of my TODO list....

TPolzer commented 8 years ago

Doesn't this also apply to other non blocking memory operations to arrays? So e.g.

clEnqueueWriteBuffer(q, input, false, 0, Sizeof.cl_double * DATA_SIZE, Pointer.to(inputdata), 0, null, null);

should be considered unsafe?

gpu commented 8 years ago

@TPolzer In this case, JOCL will internally keep a reference to the input data until the write operation finished. So the input data will not be garbage collected in the meantime. (I'm not sure whether a similar concept could be applied to the CL_MEM_USE_HOST_PTR, but that's one option that I considered).

But you indirectly raised a point that is related (or structurally similar) to this issue: Although the data may not be garbage collected, it might still be relocated by the JVM when the "critical section" that is opened with GetPrimitiveArrayCritical was closed, and for non-blocking write operations, this will be done before the write operation finished. Although the exact behavior will depend on the GC and the JVM (and I never encountered any issues here), I'll try to investigate this here as well. I'm afraid that, in the worst case, and to be on the safe side, I might have to drop the support of non-blocking operations for primitive arrays altogether. There is no way of preventing a relocation of a Java array.

(So this issue just bubbled up to the top of my TODO list)

TPolzer commented 8 years ago

It is likely almost never a real concern, because humongous objects are only compacted in full gc cycles, which should rarely occur.

javagl commented 8 years ago

Again, I have never encountered any issues here. But this may have different reasons:

But, in doubt, even if I cannot provoke an error, and/or even if it only causes problems in full GCs and these happen "rarely", this case has to be covered. The worst thing that could happen is that it "rarely" causes a problem in a production system. Such a "problem" would probably be a JVM crash, and I consider this as totally unacceptable - even if it happens "rarely". It may never crash *.


* "never" here means: It may never crash when it is used according to the OpenCL spec. But this finds its limits in the underlying implementation. For example, a recent commit was remotely related to that: https://github.com/gpu/JOCL/commit/5c6e44f8dd6a84d539ec8cf2b489f707e12f3d07 - unfortunately, the underlying OpenCL implementations (by AMD and NVIDIA) also don't behave very graciously here: When calling clWaitForEvents with a cl_event that was already released, one might expect them to return CL_INVALID_EVENT - but instead, they crash the whole program, painfully. However, this is not something that I can sensibly prevent in the (thin) JOCL layer. The question of whether Java arrays may be used in async operations is in the responsibility of JOCL, and it either has to work (always), or has to be prevented.

gpu commented 8 years ago

^ This comment was by me. I was logged in with the wrong account.

gpu commented 8 years ago

I did some general tests, particularly regarding the secondary issue.

From the first test, it indeed seems that a relocation of arrays occurs only during a full GC, and the likeliness of a real relocation decreases with a larger array size. All this is to be taken with a grain of salt, as it is mainly based on some Unsafe/verbose:gc/System.gc() trickery.

But undoubtedly, an array may be relocated by the GC, at "arbitrary" points in time.

However, it's really hard to provoke the error that suspectedly may happen during a non-blocking clEnqueueWriteBuffer with a Java array. I played with different configurations, but I cannot imagine a way to deterministically cause a GC with a relocation exactly in the short time where it would have to happen to cause a problem. (This would be between the CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END time of the event that is associated with clEnqueueWriteBuffer - and this is only a few ms even for ridiculously large arrays).

Nevertheless, right now, I assume that I'll have to drop the support of non-blocking operations on Java arrays, because it is impossible to guarantee that this will work. This will likely be a change in JOCL 2.1 (which is pending anyhow). Unfortunately, this will affect backward compatibility, I'll still have to figure out how exactly this change will be implemented.

All this refers to the secondary issue. The primary one, namely the use of CL_MEM_USE_HOST_PTR will likely be a similar case, but still has to be investigated.

blueberry commented 8 years ago

@gpu In your place, I won't be that concerned with backward compatibility, since it seems that all users are early adopters and our primary goal is to get the best possible technical solution. Please just provide clear and precise documentation (and warnings :) about changes and best ways to use the library, and don't worry. Thank you for the great effort you put into this!

Also, if you could release the tools you use for creating this as open source, that would be great, since we could try to help you more :)

gpu commented 8 years ago

@blueberry Some off-topic (I'm not sure how much I should elaborate this here) : Eight years ago I started this "code generation" with some crude hacks, to generate the code of JCublas. Later, I generalized this into a "proper" code generator, but went waaay to far with that (it became an "abstraction hell"). Later, I concretized it a bit, and tried to bring it into a shape that could actually be published, but it's still far from perfect. It is mainly used for parts of the libraries like JCublas or JOCLBlast, but there are still manual steps involved to integrate the generated code (and it's hard to convey the necessary assembly steps for someone who is not already familiar with it). So I think that publishing it now would be of very limited use for externals. (I would, however, provide it for anyhone who is interested in it (via mail). One should not expect it to be "easy to use" or run "out of the box", but ... maybe some feedback would help to bring it into a really publishable shape).

But more importantly, and leading to this issue: The code of JOCL itself is not generated with this code generator. One of the reasons is exactly related to this issue: There are many subtleties that might get lost when "blindly" doing a header-to-JNI-code-generation.


Now, specifically regarding this issue:

I consider backward compatibility as an important point. If I change the behavior here, this would mean that someone who used

clEnqueueWriteBuffer(..., /*blocking=*/ false, Pointer.to(someJavaArray), ...);

will receive an exception in the next version -

new CLException("Non-blocking writes with Java arrays are not possible");

And this could annoy people. I'm aware of this, and if this is changed, I will prominently point it out in the release notes.

However, I think that this change is still reasonable, and the reasons are: