strawlab / flydra

live, low-latency markerless multi-camera 3D animal tracking system
Apache License 2.0
58 stars 18 forks source link

Crash in camnode.py while calibrating #30

Closed sdvillal closed 8 years ago

sdvillal commented 8 years ago

@etiennecampione writes: "Hereafter is the error message I almost always get (let's say 95% of the time) while calibrating a Flycube. Can you indicate me what is going on? It's a really annoying and time-consuming problem".

[strawlab@flycube8 flycube8](master=)$ roslaunch ./flycube_only_flydra_calib.launch --screen
... logging to /home/strawlab/.ros/log/7433d1b0-cf1a-11e5-b183-bcee7bdac428/roslaunch-flycube8-5300.log
[...]
Exception in thread LogMessageSender:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/dist-packages/flydra/camnode.py", line 1041, in run
    self._main_brain.log_message(*log_message)
  File "/usr/lib/python2.7/dist-packages/flydra/camnode.py", line 197, in log_message
    self._log_message(req)
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_service.py", line 430, in __call__
    return self.call(*args, **kwds)
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_service.py", line 506, in call
    transport.send_message(request, self.seq)
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_base.py", line 590, in send_message
    self.write_buff.truncate(0)
AttributeError: 'NoneType' object has no attribute 'truncate'
[INFO] [WallTime: 1455014842.446372] setting camera property expected_trigger_framerate=100.0
[FATAL] [WallTime: 1455014842.446997] ERROR: thread LogMessageSender died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.497085] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.537913] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.578685] ERROR: thread cam_processor.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.619367] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[INFO] [WallTime: 1455014842.628120] triggerbox_client: waiting for clockmodel estimate
[FATAL] [WallTime: 1455014842.629951] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630143] ERROR: thread cam_processor.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630313] ERROR: thread LogMessageSender died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630480] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630644] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.630809] ERROR: thread cam_processor.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.631007] ERROR: thread LogMessageSender died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.631231] ERROR: thread save_cam.mainloop died unexpectedly. Quitting
[FATAL] [WallTime: 1455014842.631436] ERROR: thread save_small.mainloop died unexpectedly. Quitting
[INFO] [WallTime: 1455014843.128506] triggerbox_client: waiting for clockmodel estimate
[flydra_camera_node-4] process has died [pid 5341, exit code 1].
log files: /home/strawlab/.ros/log/7433d1b0-cf1a-11e5-b183-bcee7bdac428/flydra_camera_node-4*.log

How does the transport die? Is it possible that the logging service (and therefore the transport) is closed in one camera LogMessageThread thread while there are still messages to log in some other camera thread, and so we would need to add sync code? Or could it just be a network problem? Surely @astraw will know better about this.

I'm saving the full logs of that run here; they include a few other stacktraces.

  /mnt/strawscience/santi/strawlab/calibration-transport-not-open-logs
nzjrs commented 8 years ago

If you updated to 0.6.8 flydra then you will also need to update ros_flydra

astraw commented 8 years ago

@etiennecampione can you confirm this is new (did not happen before)?

Also: is this with the version of flydra which I just packaged and released a couple weeks ago? (What is the output of python -c "import flydra.version; print flydra.version.__version__"?) Is this on a specific flycube?

etiennecampione commented 8 years ago

The version 0.6.8 of flydra is installed (it is the output of "python -c "import flydra.version; print flydra.version.version"") and ros_flydra is updated ("git pull --ff-only") and re-built ("make clean", then "make"). I confirm that I did not have this problem before. I currently see it on both Flycubes 3 and 8 which would need a new calibration.

astraw commented 8 years ago

Yes, then update to git master of ros_flydra. Something like

roscd ros_flydra
git fetch
git merge origin/master

Please close this bug if that solves it for you.

etiennecampione commented 8 years ago

@astraw: I already was on the last commit of the ros_flydra repository, so these command lines did not change anything and alas could not solve the problem. The last calibration I made was on the 17th December 2015, and I did not notice this problem at this time. So I fear it came with the new version of flydra.

astraw commented 8 years ago

Oh, yeah, you also have to make the latest ros_flydra

rosmake -s ros_flydra

Did you do that, too?

nzjrs commented 8 years ago

Relevant logs on the master side:

[rosout][INFO] 2016-02-09 11:47:22,196: received log message from Basler_21418139: excellent, grab thread running in maximum prioity mode
[rosout][INFO] 2016-02-09 11:47:22,202: received log message from Basler_21425980: using cam_iface driver: mega (wrapper: ctypes)
[rosout][INFO] 2016-02-09 11:47:22,210: received log message from Basler_21425975: using cam_iface driver: mega (wrapper: ctypes)
[rosout][INFO] 2016-02-09 11:47:22,211: received log message from Basler_21425975: excellent, grab thread running in maximum prioity mode
[rosout][INFO] 2016-02-09 11:47:22,214: received log message from Basler_21418139: using cam_iface driver: mega (wrapper: ctypes)
[rosout][INFO] 2016-02-09 11:47:22,421: received log message from Basler_21418139:   Basler_21418139 frames apparently skipped: 56 (168 vs 111)
[rosout][INFO] 2016-02-09 11:47:22,421: received log message from Basler_21425990:   Basler_21425990 frames apparently skipped: 98 (169 vs 70)
[rosout][INFO] 2016-02-09 11:47:22,422: received log message from Basler_21418139:   Basler_21418139 frames apparently skipped: 50 (111 vs 60)
[rosout][INFO] 2016-02-09 11:47:22,429: received log message from Basler_21425975:   Basler_21425975 frames apparently skipped: 48 (110 vs 61)
[rosout][INFO] 2016-02-09 11:47:22,627: triggerbox_client: waiting for clockmodel estimate
[rosout][INFO] 2016-02-09 11:47:23,128: triggerbox_client: waiting for clockmodel estimate
[rosout][INFO] 2016-02-09 11:47:23,628: triggerbox_client: got clockmodel estimate
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21425990 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21425985 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21425975 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,629: Basler_21418139 (re)synchronized
[rosout][INFO] 2016-02-09 11:47:23,630: Basler_21425980 (re)synchronized
[rosout][WARNING] 2016-02-09 11:48:23,213: unknown Exception receiving timestamp echo data: timed out
[rospy.core][INFO] 2016-02-09 11:48:23,653: signal_shutdown [coordinate processor thread died]
[rospy.topics][ERROR] 2016-02-09 11:48:23,664: Traceback (most recent call last):
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/topics.py", line 245, in close
    c.close()
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_base.py", line 761, in close
    self.socket.close()
AttributeError: 'NoneType' object has no attribute 'close'

[rospy.topics][ERROR] 2016-02-09 11:48:23,664: Traceback (most recent call last):
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/topics.py", line 245, in close
    c.close()
  File "/opt/ros/ros.electric.boost1.46/ros_comm/clients/rospy/src/rospy/impl/tcpros_base.py", line 761, in close
    self.socket.close()
AttributeError: 'NoneType' object has no attribute 'close'

[rospy.impl.masterslave][INFO] 2016-02-09 11:48:23,664: coordinate processor thread died
[rosout][INFO] 2016-02-09 11:48:24,279: mainbrain finished
[rosout][WARNING] 2016-02-09 11:48:26,779: cameras failed to quit cleanly: ['Basler_21425980', 'Basler_21425975', 'Basler_21425990', 'Basler_21418139', 'Basler_21425985']
[rosout][WARNING] 2016-02-09 11:48:29,280: cameras failed to quit cleanly: ['Basler_21425980', 'Basler_21425975', 'Basler_21425990', 'Basler_21418139', 'Basler_21425985']
[rospy.core][INFO] 2016-02-09 11:48:29,280: signal_shutdown [atexit]

It receives some log messages, and then the coordprocessor thread dies.

At first I thought the logging code I added was responsible, but now that I see it revieves some log messages I am not sure. Thoughts?

etiennecampione commented 8 years ago

@astraw: I ran rosmake -s ros_flydra, this alas didn't fix the problem.

astraw commented 8 years ago

@sdvillal can you chmod -R g+rx /mnt/strawscience/santi/strawlab/calibration-transport-not-open-logs

nzjrs commented 8 years ago

done, FWIW you can do all permission fixups via strawcore

root@strawcore:~# ./fix-isilon-perms.sh /mnt/strawscience/santi/strawlab/calibration-transport-not-open-logs
sdvillal commented 8 years ago

Yeah, since some time already permissions in strawscience are sometimes broken even when creating files with permissions properly configured.

nzjrs commented 8 years ago

I just pushed 478e6ab which stops the crashes.

etienne is still unable to detect points, but I think that is unrelated