Open Affonso-Gui opened 3 years ago
Yes, the problem is reproduced on my machine as well. My environment is as follows:
$ apt show ros-melodic-roseus
Package: ros-melodic-roseus
Version: 1.7.4-1bionic.20210505.033302
...
In my environment, I called the (foo) function 10 times while restarting roseus, all of which printed 900 and caused a segmentation fault.
1.irteusgl$ (foo)
100
200
300
400
500
600
700
800
900
Call Stack (max depth: 20):
0: at (ros::spin-once)
1: at (ros::spin-once)
2: at (while (not (send sub :get)) (ros::spin-once))
3: at (let ((sub (instance my-sub :init))) (ros::subscribe topic msgtype #'send sub :update) (while (not (send sub :get)) (ros::spin-once)) (send sub :get))
4: at (my-one-shot "/msg" std_msgs::string)
5: at (while t (if (zerop (mod (incf i) 100)) (print i)) (my-one-shot "/msg" std_msgs::string))
6: at (let ((i 0)) (while t (if (zerop (mod (incf i) 100)) (print i)) (my-one-shot "/msg" std_msgs::string)))
7: at (foo)
8: at #<compiled-code #X560a24deee18>
;p=pointer?(0x560a24d2c078)
;; Segmentation Fault.
;; in /opt/ros/melodic/share/euslisp/jskeus/eus/Linux64/bin/irteusgl 0 error: cannot find method euserror
;; You are still in a signal handler.
;;Try reset or throw to upper level as soon as possible.
;; code=-1472911824 x=a8352500 addr=0
Fatal:
It is a workaround, I think. 'gensym' may occure GC. So, we should push args(and other objects) to stack. Then, pop it.
diff --git a/roseus/roseus.cpp b/roseus/roseus.cpp
index f0b4a19..b27ca20 100644
--- a/roseus/roseus.cpp
+++ b/roseus/roseus.cpp
@@ -369,8 +369,10 @@ public:
ROS_ERROR("subscription callback function install error");
}
// avoid gc
+ vpush(args);
pointer p=gensym(ctx);
setval(ctx,intern(ctx,(char*)(p->c.sym.pname->c.str.chars),strlen((char*)(p->c.sym.pname->c.str.chars)),lisppkg),cons(ctx,scb,args));
+ vpop(); // pop agrs
}
~EuslispSubscriptionCallbackHelper() {
ROS_ERROR("subscription gc");
@YoheiKakiuchi I have previously attempted a similar approach in https://github.com/jsk-ros-pkg/jsk_roseus/pull/636 and was kind of uncertain if it made any sense, but I guess that it actually is a valid solution for this error then! With these changes I was able to run 40,000+ iterations without any problem.
Here is the correct place to fix this problem.
@@ -870,11 +872,12 @@ pointer ROSEUS_SUBSCRIBE(register context *ctx,int n,pointer *argv)
fncallback = argv[2];
args=NIL;
for (int i=n-1;i>=3;i--) args=cons(ctx,argv[i],args);
-
+ vpush(args);
EuslispMessage msg(message);
boost::shared_ptr<SubscriptionCallbackHelper> *callback =
(new boost::shared_ptr<SubscriptionCallbackHelper>
(new EuslispSubscriptionCallbackHelper(fncallback, args, message)));
+ vpop();
SubscribeOptions so(topicname, queuesize, msg.__getMD5Sum(), msg.__getDataType());
so.helper = *callback;
Subscriber subscriber = lnode->subscribe(so);
'args' is created on source of C-lang, so 'args' is not protected anywhere. Then 'args' may be collected as a garbege. ( objects in argv, such as 'sub' in this case, are not collected ) fncallback may also be protected because it is arguments of function.
This fix also seems stable!
I see, so it is not exactly the object instance being collected but the list as a whole, right? I'll do some more testing tomorrow and (re)organize a PR if everything looks fine.
Also any insights if we should dump the lisp symbol on unsubscriptions? Registering on subscriptions and never unregistering them does seem like a memory leak...
Yes, list(cons) object is collected. objects in list are safe.
Can we delete symbol ? I think reuse of symbol is realistic. Repeating subscribe many times is non sense, so it's not really a problem.
I have been struggling with weird segfaults on my custom build for some time but I believe I have finally found a way to reproduce them in the release build:
rostopic pub /msg std_msgs/String "hello" -r 100
(defclass my-sub :slots (msg))
(defmethod my-sub (:init () self) (:update (_msg) (setq msg _msg)) (:get () msg))
(defun my-one-shot (topic msgtype) (let ((sub (instance my-sub :init))) (ros::subscribe topic msgtype #'send sub :update) (while (not (send sub :get)) (ros::spin-once)) (send sub :get)))
(defun foo () (let ((i 0)) (while t (if (zerop (mod (incf i) 100)) (print i)) (my-one-shot "/msg" std_msgs::String))))
(foo)