Open wardev opened 1 year ago
Within your callback's onMessage() and onReply() methods, you are storing the provided Message object within your own ArrayList container. Please copy the Message object before stowing it. From the onReply() JavaDoc:
DO NOT STORE the Message objects for use beyond the scope of the callback. Otherwise, make a copy of the Message object(s).
Try something like:
messages.add( new Message(reply) );
Yes, thanks, I figured that out. I still think it is an error that this code causes a seg fault. Throwing a GMSEC_Exception
would be fine because that can be handled via normal error handling methods. But with a seg fault there is no chance to handle the error and it immediately kills the process.
It looks like one of the preconditions for using reinterpret_cast
is that the memory is a object of the type being cast to. Based on the documentation you quote and the test case provided that is provably false for Message
. Shouldn't the code then use a checked cast, such as dynamic_cast
?
The GMSEC API attempted to honor your client application's request to retrieve the message subject, however it has no idea that the underlying pointer to a C++ Message object that is being referenced through the Java-based Message object (stored in your ArrayList) has already been freed/destroyed. The attempt to access a destroyed object is what leads to a seg-fault.
A dynamic_cast is used for down-casting an object, for example a Field object to a StringField object. Whereas MistMessage (sp?) is a subclass of Message, the former type of object is not provided in a callback. In other words, MistMessage ceases to exist once it is transmitted across the bus. The object provided in the callback is-a Message, nothing else.
Thanks for the explanation. That makes sense. I'm not used to dealing with the details of memory allocation. So how would you modify GMSEC API so that it can't seg fault? Would that be storing the message passed to the call back on the heap? Using automatic reference counting?
When dealing with programming languages such as C and C++, there's always the chance for a seg-fault. A developer has to be careful/cognizant of what they are doing. Whereas C++ offers containers that can maintain a reference count for a shared pointer, this is moot when the referenced object is shared across a language binding, such as JNI (to support Java). There is no way to know whether a Java app still has a reference to the object allocated in C++ land. For this reason, Message object(s) passed to a callback are limited to the scope of the callback itself, and hence why a copy should be made in situations where it is needed for a longer period.
A common reason to make a copy of a Message delivered to a callback is to defer processing until a later time, perhaps in another thread. The callback could be summoned hundreds (maybe thousands) of times per second, and in such cases it would be unwise to process the Message within the callback itself. Some apps stow a copy of the Message object within a synchronized queue that is shared with another thread.
Thanks, that is similar to the use case I'm implementing. It just seems quite dangerous that forgetting a simple copy kills the whole app in violent way, without closing streams, etc.
There is no way to know whether a Java app still has a reference to the object allocated in C++ land.
It is possible and looking at the NASA API code it is already implemented. For example, JNIMessage
has the following methods:
protected void finalize() throws Throwable
{
try {
delete();
}
finally {
super.finalize();
}
}
public synchronized void delete()
{
if (swigCPtr != 0 && swigCMemOwn)
{
gmsecJNI.delete_Message(swigCPtr, this);
swigCMemOwn = false;
}
swigCPtr = 0;
}
The finalize()
method is called by the JVM when the object is about to be garbage collected. It seems that JNIMessage
uses this opportunity to let the native code know that it is done with the native memory for the particular Message
.
This suggests one solution is to give the JVM "ownership" (set swigCMemOwn=true
) for every Message
passed to the JVM.
It seems that this could also be used to implement reference counting by decrementing a reference count when finalize()
is called instead of deleting the Message
.
That's a good idea, however it goes against the paradigm used in the core API (in C++), and in other language bindings, most of which are generated using SWIG. This is not to say the change could not be made.
Before ownership can be passed to the JVM, a copy of the Message object will still need to be made. This will impact performance, depending of course on the size of the Message. And there's always a possibility where a single Message object may be destined to be delivered to multiple callbacks in a single app, thus incurring multiple copies, and yet additional performance hits.
git diff java/src/gmsecJNI_Jenv.cpp
diff --git a/java/src/gmsecJNI_Jenv.cpp b/java/src/gmsecJNI_Jenv.cpp
index 7ad6956c..7f938f97 100644
--- a/java/src/gmsecJNI_Jenv.cpp
+++ b/java/src/gmsecJNI_Jenv.cpp
@@ -285,7 +285,7 @@ JavaVM* AutoJEnv::getVM()
jobject gmsec::api5::jni::createJavaMessage(JNIEnv* jenv, const gmsec::api5::Message& message)
{
- jobject jniMessage = jenv->NewObject(Cache::getCache().classJNIMessage, Cache::getCache().methodMessageInitJZ, JNI_POINTER_TO_JLONG((&message)), JNI_FALSE);
+ jobject jniMessage = jenv->NewObject(Cache::getCache().classJNIMessage, Cache::getCache().methodMessageInitJZ, JNI_POINTER_TO_JLONG((new Message(message))), JNI_TRUE);
jobject jMessage = jenv->NewObject(Cache::getCache().classMessage, Cache::getCache().methodMessageInit, jniMessage);
if (!gmsec::api5::jni::jvmOk(jenv, "createJavaMessage: new Message") || !jMessage)
Not every application will need to stow a Message object delivered to a callback; some just access particular fields to handle some simple task. I suppose this may be one reason the design paradigm was chosen -- honestly, I cannot recall. When I started development on the GMSEC API, callbacks in the Java binding were not even supported.
P.S. Note that GMSEC API 5.0 is available (on NASA GitHub). Support for API 4.x will continue to be provided, but do not expect a new release in that series.
That's great, thanks for the ideas. I understand there is a tradeoff between performance and safety. :)
I'll take a look at 5.0.
As a Java developer I expect NASA's code to never segfault and kill the JVM. NullPointerExceptions are fine, segfaults are not.
Using GMSEC API version 4.6 I can produce a segfault every time by calling
getSubject()
on a receivedREPLY
message. I can callgetSubject()
on receivedPUBLISH
orREQUEST
messages without any problems. Reproducer code and stack trace top are below.I assume the problem is not actually in
std::string::c_str()
. My theory is that the issue is due to usingreinterpret_cast<Message*>(jlong)
in the native code on some long that is not the address of a message. How NASA's Java interface got an invalid pointer is beyond me.