Open prjemian opened 4 years ago
Response from current code:
(bsmotor) mintadmin@mint-vm:~/.../eclipse/bstesting$ ./test3.py 10
Loading metadata history from /home/mintadmin/.config/bluesky/bluesky_history.db
ping 1: 0.1 0.1
m1: 92%|██████████████████████████████████████████████████████████████████████████████████ | 0.1845/0.2 [00:00<00:00, 1.51s/degrees]
/home/mintadmin/Apps/anaconda/envs/bsmotor/lib/python3.7/site-packages/tqdm/std.py:484: TqdmWarning: clamping frac to range [0, 1]
pong 1: -0.1 -0.1
ping 2: 0.1 0.1
m1: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 0.2/0.2 [00:00<00:00, 3.95s/degrees]
failed in second cycle
start putting in logging statements and sort out the timing between the things happening on different threads and at the IOC level.
Since the IOC reports time based on timestamps from the record in the IOC, we need to correlate that with the python timestamps (the time at which the information was received from EPICS by python). New values in some fields of the motor record do not automatically update the record's time stamp, such the .DMOV
field. The correlation between EPICS and Python time is very important.
Using ophyd 1.3.3, the ping_pong()
plan also stalls. Seems to run a few more cycles but still stalls.
Another diagnostic to test if the problem is a TimeoutError with get() or a status object problem:
import itertools
for i in enumerate(itertools.count()):
print(i, m1.motor_egu.get())
This process about 1k cycles/s and does not fail after 250,000 iterations (and continuing).
Not the problem.
Worked through this on a conference call with others on the team. Problem was not reproducible by the others on the call. We tried ophyd 1.4.0rc3, 1.3.3, and a custom branch of 1.4.0rc3. With 1.3.3 problem happens after more cycles than 1.4.0rc3 on my VM but not on the others' workstations.
I will try to reproduce on other workstations, especially on a workstation others might be able to access.
Ken Lauer showed an additional low-level diagnostic flag available in the ophyd.EventDispatcher
: debug_monitor=False
(default). Setting to True when this is called will provide more details and that is the substance of the custom branch mentioned above.
Ken believes the problem is in how ophyd works with PyEpics to handle PV callbacks.
Dan allan suggest that the changes we want from ophyd 1.4 (involving ScalerCH) can be backported, with other changes, to a new release 1.3.4. The master ophyd branch has code that provokes this problem more and should not be used for production work. The master branch is not in a position that it can be released very soon.
@danielballan : I could work around my desire to use ophyd 1.4+ because of changes I made to the ScalerCH class. I could add that modified device to apstools in the interim until a new ophyd can be released. Would that help?
To run my IOC is a docker environment:
cd /tmp
wget https://raw.githubusercontent.com/prjemian/epics-docker/master/n3_synApps/start_xxx.sh
chmod +x start_xxx.sh
./start_xxx.sh sky
then test that IOC
caget sky:UPTIME sky:m1.DESC
Smallest example that reproduces this error: https://github.com/prjemian/bstesting/blob/master/test3.py
(bsmotor) mintadmin@mint-vm:~/.../eclipse/bstesting$ ./test3.py 10
ping 1: 0.1 0.1
m1: 93%|██████████████████████████████████████████████████████████████████████████████████▉ | 0.1865/0.2 [00:00<00:00, 1.62s/degrees]
/home/mintadmin/Apps/anaconda/envs/bsmotor/lib/python3.7/site-packages/tqdm/std.py:484: TqdmWarning: clamping frac to range [0, 1]
pong 1: -0.1 -0.1
ping 2: 0.1 0.1
m1: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 0.2/0.2 [00:00<00:00, 3.98s/degrees]
Did not reproduce on poof (linux mint system with lots of cores and RAM).
This change:
---: print(msg)
+++: print(msg, os.system("uptime"))
changes things enough that the problem (not solved) takes much longer to appear. Was stalling after 1 or 2 cycles, now stalling after dozens of cycles. Also, configured the development VM for 2 cores (was 1 core).
Even with https://github.com/bluesky/ophyd/pull/787#issuecomment-568203881, still fails quickly on my VM:
./test3.py 1000 2>&1 | tee /tmp/output.log
``` [D 13:08:34.239 ophydobj:187] This is the first instance of OphydObject. name={self.name}, id={id(self)} [D 13:08:34.240 ophydobj:187] This is the first instance of OphydObject. name={self.name}, id={id(self)} [D 13:08:34.241 ophydobj:187] This is the first instance of OphydObject. name={self.name}, id={id(self)} [D 13:08:34.293 epics_motor:237] [ts=2019-12-21 13:07:15.62474] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1) [D 13:08:34.363 epics_motor:237] [ts=2019-12-21 13:07:15.62474] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0) m1: 50%|███████████████▌ | 0.1/0.2 [00:00<00:00, 2.06s/degrees] [A m1: 85%|█████████████████████████▌ | 0.17/0.2 [00:00<00:00, 1.80s/degrees] [A m1: 100%|███████████████████████████████| 0.2/0.2 [00:00<00:00, 2.04s/degrees] [A[D 13:08:34.863 epics_motor:237] [ts=2019-12-21 13:08:34.86247] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1) m1 [Complete.] [A [A2019-12-21 13:08:34.865754 ping 1: 0.1 0.1 [D 13:08:34.869 epics_motor:237] [ts=2019-12-21 13:08:34.86247] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0) m1: 45%|█████████████▍ | 0.09/0.2 [00:00<00:00, 2.21s/degrees] [A m1: 85%|█████████████████████████▌ | 0.17/0.2 [00:00<00:00, 1.76s/degrees] [A m1: 100%|███████████████████████████████| 0.2/0.2 [00:00<00:00, 2.00s/degrees] [A[D 13:08:35.366 epics_motor:237] [ts=2019-12-21 13:08:35.36507] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1) m1 [Complete.] [A [A2019-12-21 13:08:35.371632 pong 1: -0.1 -0.1 [D 13:08:35.376 epics_motor:237] [ts=2019-12-21 13:08:35.36507] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0) m1: 45%|█████████████▍ | 0.09/0.2 [00:00<00:00, 2.15s/degrees] [A m1: 85%|█████████████████████████▌ | 0.17/0.2 [00:00<00:00, 1.74s/degrees] [A m1: 100%|███████████████████████████████| 0.2/0.2 [00:00<00:00, 1.98s/degrees] [A[D 13:08:35.869 epics_motor:237] [ts=2019-12-21 13:08:35.86850] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1) m1 [Complete.] [A [A2019-12-21 13:08:35.873292 ping 2: 0.1 0.1 [D 13:08:35.875 epics_motor:237] [ts=2019-12-21 13:08:35.86850] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0) m1: 45%|█████████████▍ | 0.09/0.2 [00:00<00:00, 2.24s/degrees] [A m1: 85%|█████████████████████████▌ | 0.17/0.2 [00:00<00:00, 1.76s/degrees] [A m1: 100%|███████████████████████████████| 0.2/0.2 [00:00<00:00, 2.01s/degrees] [A[D 13:08:36.374 epics_motor:237] [ts=2019-12-21 13:08:36.37297] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1) m1 [Complete.] [A [A2019-12-21 13:08:36.378595 pong 2: -0.1 -0.1 [D 13:08:36.381 epics_motor:237] [ts=2019-12-21 13:08:36.37297] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0) m1: 45%|█████████████▍ | 0.09/0.2 [00:00<00:00, 2.16s/degrees] [A m1: 85%|█████████████████████████▌ | 0.17/0.2 [00:00<00:00, 1.74s/degrees] [A m1: 100%|███████████████████████████████| 0.2/0.2 [00:00<00:00, 1.98s/degrees] [A[D 13:08:36.875 epics_motor:237] [ts=2019-12-21 13:08:36.87488] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1) ```
camonitor -tc sky:m1.{DMOV,VAL,RBV} 2>&1 | tee /tmp/camonitor.log
``` sky:m1.DMOV (2019-12-21 13:08:19.861562) 1 sky:m1.VAL (2019-12-21 13:08:19.861627) -0.1 sky:m1.RBV (2019-12-21 13:08:19.861637) -0.1 sky:m1.DMOV (2019-12-21 13:08:34.359826) 0 sky:m1.VAL (2019-12-21 13:08:34.359856) 0.1 sky:m1.RBV (2019-12-21 13:08:34.454223) -0.08 sky:m1.RBV (2019-12-21 13:08:34.560073) 0 sky:m1.RBV (2019-12-21 13:08:34.660193) 0.07 sky:m1.RBV (2019-12-21 13:08:34.762120) 0.1 sky:m1.DMOV (2019-12-21 13:08:34.862665) 1 sky:m1.DMOV (2019-12-21 13:08:34.868328) 0 sky:m1.VAL (2019-12-21 13:08:34.868360) -0.1 sky:m1.RBV (2019-12-21 13:08:34.963311) 0.08 sky:m1.RBV (2019-12-21 13:08:35.064886) 0.01 sky:m1.RBV (2019-12-21 13:08:35.164608) -0.07 sky:m1.RBV (2019-12-21 13:08:35.266161) -0.1 sky:m1.DMOV (2019-12-21 13:08:35.365282) 1 sky:m1.DMOV (2019-12-21 13:08:35.375885) 0 sky:m1.VAL (2019-12-21 13:08:35.375918) 0.1 sky:m1.RBV (2019-12-21 13:08:35.465394) -0.08 sky:m1.RBV (2019-12-21 13:08:35.565870) -0.01 sky:m1.RBV (2019-12-21 13:08:35.666916) 0.07 sky:m1.RBV (2019-12-21 13:08:35.768187) 0.1 sky:m1.DMOV (2019-12-21 13:08:35.868839) 1 sky:m1.DMOV (2019-12-21 13:08:35.875193) 0 sky:m1.VAL (2019-12-21 13:08:35.875250) -0.1 sky:m1.RBV (2019-12-21 13:08:35.971607) 0.08 sky:m1.RBV (2019-12-21 13:08:36.074940) 0.01 sky:m1.RBV (2019-12-21 13:08:36.172844) -0.07 sky:m1.RBV (2019-12-21 13:08:36.274668) -0.1 sky:m1.DMOV (2019-12-21 13:08:36.373254) 1 sky:m1.DMOV (2019-12-21 13:08:36.382170) 0 sky:m1.VAL (2019-12-21 13:08:36.382203) 0.1 sky:m1.RBV (2019-12-21 13:08:36.473451) -0.08 sky:m1.RBV (2019-12-21 13:08:36.574033) -0.01 sky:m1.RBV (2019-12-21 13:08:36.674654) 0.07 sky:m1.RBV (2019-12-21 13:08:36.774913) 0.1 sky:m1.DMOV (2019-12-21 13:08:36.875061) 1 ```
Ran with additional debug and note that stall happens when trying to read the low limit switch value:
[D 13:21:10.849 epics_motor:237] [ts=2019-12-21 13:21:10.84419] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 13:21:11.397 epics_motor:237] [ts=2019-12-21 13:21:11.39658] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 13:21:11.397 signal:854] m1_direction_of_travel.get(): set timeout=1.000000
[D 13:21:11.398 signal:854] m1_low_limit_switch.get(): set timeout=1.000000
2019-12-21 13:21:11.399525 ping 14: 0.1 0.1
[D 13:21:11.400 signal:854] m1_motor_egu.get(): set timeout=1.000000
[D 13:21:11.401 epics_motor:237] [ts=2019-12-21 13:21:11.39658] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 13:21:11.900 epics_motor:237] [ts=2019-12-21 13:21:11.89947] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 13:21:11.901 signal:854] m1_direction_of_travel.get(): set timeout=1.000000
[D 13:21:11.901 signal:854] m1_low_limit_switch.get(): set timeout=1.000000
2019-12-21 13:21:11.902433 pong 14: -0.1 -0.1
[D 13:21:11.902 signal:854] m1_motor_egu.get(): set timeout=1.000000
[D 13:21:11.904 epics_motor:237] [ts=2019-12-21 13:21:11.89947] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 13:21:12.403 epics_motor:237] [ts=2019-12-21 13:21:12.40243] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 13:21:12.403 signal:854] m1_direction_of_travel.get(): set timeout=1.000000
[D 13:21:12.404 signal:854] m1_low_limit_switch.get(): set timeout=1.000000
Note this comes (13:21:12.404 ) after EPICS has reported the motor is done moving (13:21:12.402612):
sky:m1.VAL (2019-12-21 13:21:11.401031) -0.1
sky:m1.RBV (2019-12-21 13:21:11.497524) 0.08
sky:m1.RBV (2019-12-21 13:21:11.598516) 0.01
sky:m1.RBV (2019-12-21 13:21:11.698650) -0.07
sky:m1.RBV (2019-12-21 13:21:11.798784) -0.1
sky:m1.DMOV (2019-12-21 13:21:11.899793) 1
sky:m1.DMOV (2019-12-21 13:21:11.903649) 0
sky:m1.VAL (2019-12-21 13:21:11.903670) 0.1
sky:m1.RBV (2019-12-21 13:21:11.999932) -0.08
sky:m1.RBV (2019-12-21 13:21:12.100720) -0.01
sky:m1.RBV (2019-12-21 13:21:12.201509) 0.07
sky:m1.RBV (2019-12-21 13:21:12.302476) 0.1
sky:m1.DMOV (2019-12-21 13:21:12.402612) 1
Ran several (>5) times, always stalls with m1_low_limit_switch.get()
. Something to explore there why that specifically.
This code (https://github.com/bluesky/ophyd/blob/11c5c80e6ff2cd73fd1a9c5b34de46c3007f031f/ophyd/epics_motor.py#L245-L248):
# Check if we are moving towards the low limit switch
if self.direction_of_travel.get() == 0:
if self.low_limit_switch.get() == 1:
success = False
added time.time()
to debug message:
[D 13:42:27.040 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1576957347.040645)
[D 13:42:27.041 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1576957347.041287)
0.642 ms time difference between these two calls to get()
Always times out when reading low limit switch value. Might be time for a wireshark.
Always times out after reading limit switch, after call to self._done_moving()
. Foud this out by adding more diagnostics to _move_changed()
.
[D 15:39:53.669 epics_motor:237] [ts=2019-12-21 15:39:53.66901] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 15:39:53.670 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1576964393.670607)
[D 15:39:53.671 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1576964393.671289)
[D 15:39:53.672 epics_motor:255] m1._move_changed(): limit switch check success=True
[D 15:39:53.672 epics_motor:276] m1._move_changed(): before _done_moving() success=True value=1
2019-12-21 15:39:53.673493 ping 8: 0.1 0.1
[D 15:39:53.674 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1576964393.674592)
[D 15:39:53.677 epics_motor:283] m1._move_changed(): after _done_moving() success=True value=1
[D 15:39:53.678 epics_motor:237] [ts=2019-12-21 15:39:53.66901] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 15:39:54.171 epics_motor:237] [ts=2019-12-21 15:39:54.17045] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 15:39:54.172 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1576964394.172128)
[D 15:39:54.173 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1576964394.173743)
[D 15:39:54.174 epics_motor:255] m1._move_changed(): limit switch check success=True
[D 15:39:54.175 epics_motor:276] m1._move_changed(): before _done_moving() success=True value=1
[D 15:39:54.175 epics_motor:283] m1._move_changed(): after _done_moving() success=True value=1
add more diagnostics to show terms used in call to _run_subs()
from _done_moving()
:
[D 16:26:53.533 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1576967213.533783)
[D 16:26:53.537 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1576967213.537418)
[D 16:26:53.542 epics_motor:273] m1._move_changed(): before _done_moving() success=True value=1
[D 16:26:53.543 positioner:203] subscription names: {'_req_done', 'acq_done', 'readback', 'start_moving', 'done_moving'}
[D 16:26:53.544 positioner:204] m1._args_cache[done_moving]=((), {'timestamp': 1576967213.019101, 'value': 1, 'sub_type': 'done_moving', 'obj': EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])})
[D 16:26:53.548 positioner:205] m1._callbacks[done_moving]={}
[D 16:26:53.549 positioner:206] m1._done_moving() sub_type=done_moving value=1
2019-12-21 16:26:53.550177 pong 17: -0.1 -0.1
[D 16:26:53.551 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1576967213.551482)
[D 16:26:53.556 epics_motor:280] m1._move_changed(): after _done_moving() success=True value=1
[D 16:26:53.556 epics_motor:237] [ts=2019-12-21 16:26:53.52697] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 16:26:54.031 epics_motor:237] [ts=2019-12-21 16:26:54.03105] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 16:26:54.032 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1576967214.032564)
[D 16:26:54.033 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1576967214.033226)
[D 16:26:54.033 epics_motor:273] m1._move_changed(): before _done_moving() success=True value=1
[D 16:26:54.034 positioner:203] subscription names: {'_req_done', 'acq_done', 'readback', 'start_moving', 'done_moving'}
[D 16:26:54.034 positioner:204] m1._args_cache[done_moving]=((), {'timestamp': 1576967213.526967, 'value': 1, 'sub_type': 'done_moving', 'obj': EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])})
[D 16:26:54.035 positioner:205] m1._callbacks[done_moving]={}
[D 16:26:54.036 positioner:206] m1._done_moving() sub_type=done_moving value=1
[D 16:26:54.036 epics_motor:280] m1._move_changed(): after _done_moving() success=True value=1
There are no callback functions registered under sub_type="done_moving"
Interesting note (that is not the problem we are working on), the code block mentioned above:
# Check if we are moving towards the low limit switch
if self.direction_of_travel.get() == 0:
if self.low_limit_switch.get() == 1:
success = False
always checks the low_limit_switch
regardless of whether the move is positive or negative. Expecting self.direction_of_travel.get() == 1
half the time.
A new move starts before the end of _done_moving()
in epics_motor
. Is that a symptom?
```
(bsmotor) mintadmin@mint-vm:~/.../eclipse/bstesting$ ./test3.py 100 2>&1 | tee /tmp/ouput.log
[D 16:47:46.809 ophydobj:187] This is the first instance of OphydObject. name={self.name}, id={id(self)}
[D 16:47:46.809 ophydobj:187] This is the first instance of OphydObject. name={self.name}, id={id(self)}
[D 16:47:46.815 ophydobj:187] This is the first instance of OphydObject. name={self.name}, id={id(self)}
[D 16:47:46.905 epics_motor:238] [ts=2019-12-21 16:47:21.29064] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 16:47:46.989 epics_motor:147] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move()
[D 16:47:46.989 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1576968466.989409)
[D 16:47:47.010 epics_motor:238] [ts=2019-12-21 16:47:21.29064] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 16:47:47.502 epics_motor:238] [ts=2019-12-21 16:47:47.50242] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 16:47:47.503 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1576968467.503721)
[D 16:47:47.504 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1576968467.504597)
[D 16:47:47.505 epics_motor:274] m1._move_changed(): before _done_moving() success=True value=1
[D 16:47:47.505 positioner:202] subscription names: {'acq_done', '_req_done', 'readback', 'done_moving', 'start_moving'}
[D 16:47:47.505 positioner:204] m1._callbacks[done_moving]={}
[D 16:47:47.505 positioner:205] m1._done_moving() sub_type=done_moving value=1
[D 16:47:47.506 ophydobj:396] sub_type=_req_done callback=
Results of three runs of test5.py
using only PyEpics (no ophyd or bluesky)
```
2019-12-22 13:12:40.338661: pong 16312: -0.1 0.1
2019-12-22 13:12:40.338813: ping 16313: 0.1 0.1
CA.Client.Exception...............................................
Warning: "Virtual circuit unresponsive"
Context: "ioc8idi1.xray.aps.anl.gov:5064"
Source File: ../tcpiiu.cpp line 920
Current Time: Sun Dec 22 2019 13:12:40.338869244
..................................................................
Traceback (most recent call last):
File "./test5.py", line 56, in
```
2019-12-22 13:14:14.894821: pong 13950: -0.1 0.1
2019-12-22 13:14:14.895069: ping 13951: 0.1 0.1
2019-12-22 13:14:14.895316: pong 13951: -0.1 0.1
CA.Client.Exception...............................................
Warning: "Virtual circuit unresponsive"
Context: "ioc8idi1.xray.aps.anl.gov:5064"
Source File: ../tcpiiu.cpp line 920
Current Time: Sun Dec 22 2019 13:14:14.895332228
..................................................................
date
Traceback (most recent call last):
File "./test5.py", line 56, in
```
2019-12-22 13:15:41.740184: pong 17766: -0.1 0.1
2019-12-22 13:15:41.740339: ping 17767: 0.1 0.1
2019-12-22 13:15:41.740493: pong 17767: -0.1 0.1
2019-12-22 13:15:41.740647: ping 17768: 0.1 0.1
CA.Client.Exception...............................................
Warning: "Virtual circuit unresponsive"
Context: "ioc8idi1.xray.aps.anl.gov:5064"
Source File: ../tcpiiu.cpp line 920
Current Time: Sun Dec 22 2019 13:15:42.498961135
Traceback (most recent call last):
..................................................................
File "./test5.py", line 56, in
Always, a timeout immediately after libca
message Virtual circuit unresponsive
:
CA.Client.Exception...............................................
Warning: "Virtual circuit unresponsive"
Context: "ioc8idi1.xray.aps.anl.gov:5064"
Source File: ../tcpiiu.cpp line 920
Current Time: Sun Dec 22 2019 13:15:42.498961135
Traceback (most recent call last):
..................................................................```
Ran test5.py
for 1,000,000 cycles on my VM with no failures. PV provided on softIoc
instance running on same VM.
(bsmotor) mintadmin@mint-vm:~/.../eclipse/bstesting$ ./test5.py 1000000
epics 3.4.0
2019-12-22 15:57:48.950378: ping 1: 0.1 -0.1
2019-12-22 15:57:48.950767: pong 1: -0.1 0.1
2019-12-22 15:57:48.951215: ping 2: 0.1 -0.1
2019-12-22 15:57:48.952207: pong 2: -0.1 -0.1
2019-12-22 15:57:48.952706: ping 3: 0.1 0.1
2019-12-22 15:57:48.953070: pong 3: -0.1 0.1
2019-12-22 15:57:48.953633: ping 4: 0.1 0.1
2019-12-22 15:57:48.954159: pong 4: -0.1 -0.1
2019-12-22 15:57:48.954668: ping 5: 0.1 0.1
2019-12-22 15:57:48.955031: pong 5: -0.1 0.1
2019-12-22 15:57:48.955573: ping 6: 0.1 0.1
2019-12-22 15:57:48.956034: pong 6: -0.1 -0.1
2019-12-22 15:57:48.956481: ping 7: 0.1 0.1
2019-12-22 15:57:48.956896: pong 7: -0.1 -0.1
2019-12-22 15:57:48.957282: ping 8: 0.1 0.1
2019-12-22 15:57:48.957620: pong 8: -0.1 0.1
2019-12-22 15:57:48.958126: ping 9: 0.1 0.1
2019-12-22 15:57:48.958453: pong 9: -0.1 0.1
2019-12-22 15:57:48.958905: ping 10: 0.1 0.1
2019-12-22 15:57:48.959230: pong 10: -0.1 0.1
2019-12-22 15:57:48.959669: ping 11: 0.1 0.1
2019-12-22 15:57:48.960122: pong 11: -0.1 -0.1
2019-12-22 15:57:48.960381: ping 12: 0.1 -0.1
2019-12-22 15:57:48.960693: pong 12: -0.1 -0.1
2019-12-22 15:57:48.960935: ping 13: 0.1 -0.1
2019-12-22 15:57:48.961178: pong 13: -0.1 -0.1
...
2019-12-22 16:23:18.318355: pong 999998: -0.1 0.1
2019-12-22 16:23:18.319552: ping 999999: 0.1 0.1
2019-12-22 16:23:18.320142: pong 999999: -0.1 0.1
2019-12-22 16:23:18.320930: ping 1000000: 0.1 -0.1
2019-12-22 16:23:18.322037: pong 1000000: -0.1 0.1
epics 3.4.0
But look at pong 10, it does not always get where it should. The put()
call needs a put_complete=True
kwarg.
test5
needs to wait for value to change before proceeding
repeat that test of 1,000,000 cycles
(bsmotor) mintadmin@mint-vm:~/.../eclipse/bstesting$ ./test5.py 1000000
epics 3.4.0
2019-12-22 16:56:05.937325: ping 1: 0.1 0.1
2019-12-22 16:56:05.942816: pong 1: -0.1 -0.1
2019-12-22 16:56:05.944701: ping 2: 0.1 0.1
2019-12-22 16:56:05.946704: pong 2: -0.1 -0.1
2019-12-22 16:56:05.955874: ping 3: 0.1 0.1
2019-12-22 16:56:05.960147: pong 3: -0.1 -0.1
2019-12-22 16:56:05.962536: ping 4: 0.1 0.1
2019-12-22 16:56:05.965209: pong 4: -0.1 -0.1
2019-12-22 16:56:05.967652: ping 5: 0.1 0.1
2019-12-22 16:56:05.971338: pong 5: -0.1 -0.1
2019-12-22 16:56:05.979995: ping 6: 0.1 0.1
2019-12-22 16:56:05.982191: pong 6: -0.1 -0.1
...
2019-12-22 17:56:31.868671: pong 467836: -0.1 -0.1
2019-12-22 17:56:31.874254: ping 467837: 0.1 0.1
2019-12-22 17:56:31.878496: pong 467837: -0.1 -0.1
2019-12-22 17:56:31.891101: ping 467838: 0.1 0.1
2019-12-22 17:56:31.898704: pong 467838: -0.1 -0.1
2019-12-22 17:56:31.900661: ping 467839: 0.1 0.1
2019-12-22 17:56:31.909032: pong 467839: -0.1 -0.1
...
2019-12-22 18:15:16.877233: pong 999997: -0.1 -0.1
2019-12-22 18:15:16.877650: ping 999998: 0.1 0.1
2019-12-22 18:15:16.878054: pong 999998: -0.1 -0.1
2019-12-22 18:15:16.878481: ping 999999: 0.1 0.1
2019-12-22 18:15:16.878882: pong 999999: -0.1 -0.1
2019-12-22 18:15:16.879302: ping 1000000: 0.1 0.1
2019-12-22 18:15:16.879706: pong 1000000: -0.1 -0.1
epics 3.4.0
~7.8 ms per cycle
Set up a test on a different system.
softIoc -d 8idi.db
on workstation opp.test4.py
(put a float number) on workstation poof.No timeout failures observed.
sky
on workstation opp (in docker container as iocsky)test3.py 1000
(move EPICS motor) on workstation poof.No motor stall or timeout failures seen in two runs of 1000 cycles each.
Note that after _done_moving()
message comes just before status message is finished which comes before the end of move print()
.
[D 22:09:51.001 epics_motor:147] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move()
[D 22:09:51.001 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1577074191.001521)
[D 22:09:51.003 epics_motor:238] [ts=2019-12-22 22:09:50.99722] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 22:09:51.497 epics_motor:238] [ts=2019-12-22 22:09:51.49782] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 22:09:51.497 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1577074191.497805)
[D 22:09:51.498 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1577074191.498800)
[D 22:09:51.499 epics_motor:274] m1._move_changed(): before _done_moving() success=True value=1
[D 22:09:51.499 positioner:202] subscription names: {'acq_done', 'readback', '_req_done', 'done_moving', 'start_moving'}
[D 22:09:51.499 positioner:204] m1._callbacks[done_moving]={}
[D 22:09:51.500 positioner:205] m1._done_moving() sub_type=done_moving value=1
[D 22:09:51.500 ophydobj:396] sub_type=_req_done callback=<function StatusBase._finished at 0x7f283d019320>
[D 22:09:51.500 epics_motor:281] m1._move_changed(): after _done_moving() success=True value=1
2019-12-22 22:09:51.501274 ping 930: 0.1 0.1
[D 22:09:51.501 epics_motor:147] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move()
[D 22:09:51.502 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1577074191.502176)
[D 22:09:51.504 epics_motor:238] [ts=2019-12-22 22:09:51.49782] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: True (value=0)
[D 22:09:51.997 epics_motor:238] [ts=2019-12-22 22:09:51.99839] EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu']) moving: False (value=1)
[D 22:09:51.998 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1577074191.998382)
[D 22:09:51.999 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1577074191.999388)
[D 22:09:52.000 epics_motor:274] m1._move_changed(): before _done_moving() success=True value=1
[D 22:09:52.000 positioner:202] subscription names: {'acq_done', 'readback', '_req_done', 'done_moving', 'start_moving'}
[D 22:09:52.000 positioner:204] m1._callbacks[done_moving]={}
[D 22:09:52.000 positioner:205] m1._done_moving() sub_type=done_moving value=1
[D 22:09:52.000 ophydobj:396] sub_type=_req_done callback=<function StatusBase._finished at 0x7f283d007290>
[D 22:09:52.001 epics_motor:281] m1._move_changed(): after _done_moving() success=True value=1
2019-12-22 22:09:52.001889 pong 930: -0.1 -0.1
[D 22:09:52.002 epics_motor:147] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move()
Something is unusual on the VM when compared with other systems. This is the case to investigate with wireshark.
Definitely, this must be the symptom: the status message _finished()
method should be called before the move is declared done. In the VM, this happens after the next move has started.
Useful links
Wireshark v2.6 did not recognize the pre-compiled EPICS ca dissector from KEK. Hrmph. It was built for wireshark 1.10.2 a decade ago.
Try newer support: https://github.com/mdavidsaver/cashark
wget https://raw.githubusercontent.com/mdavidsaver/cashark/master/ca.lua
Only the file ca.lua is needed. Then start wireshark with
wireshark -X lua_script:/path/to/ca.lua
more success - the plugin is recognized
but the plugin has limited usefulness, it does not seem to interpret the various TCP messages, just the UDP search request. Using a filter for packets between CLIENT_IP and IOC_IP is effective:
(ip.src == CLIENT_IP && ip.dst == IOC_IP) || (ip.src == IOC_IP && ip.dst == CLIENT_IP)
Without richer analytical tools, the wireshark investigation is a rabbit hole. In both VM and non-VM cases, the motor .DMOV field goes to 1 before the status object is done. In the VM, the stall is always observed after the .DMOV field foes to 1, yet it is not clear the status object's _finished()
method has been called.
The TimeoutError is not being detected on the VM, when testing for the motor stall. Focus on the motor stall problem in the VM as it points to either a problem with the VM of some issue in ophyd status objects to be resolved. Curious this is only seen in the VM.
Better to investigate what is happening with the life-cycle of the status objects in the VM case (where the next move starts before the logging reports the status object is done) and the non-VM case (where the status object is done before the next move starts, as expected).
Continuing to add diagnostic logging statements to identify what happens at the time of stalling (on the VM system). Added UUID to each status object for diagnostic purposes.
```
[D 12:46:09.348 ophydobj:187] This is the first instance of OphydObject. name=m1, id=140235712277968
[D 12:46:09.349 ophydobj:187] This is the first instance of OphydObject. name=m1_user_readback, id=140235712325904
[D 12:46:09.350 ophydobj:187] This is the first instance of OphydObject. name=m1_user_setpoint, id=140235712327440
[D 12:46:09.390 epics_motor:239] [ts=2019-12-26 12:46:06.67488] m1 moving: False (value=1)
[D 12:46:09.406 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1577385969.406899)
[D 12:46:09.407 positioner:190] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move() uuid=111bcb23-ce0c-4953-ab65-6c92d2e06c5f
[D 12:46:09.407 epics_motor:150] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move() uuid=111bcb23-ce0c-4953-ab65-6c92d2e06c5f
[D 12:46:09.413 epics_motor:239] [ts=2019-12-26 12:46:06.67488] m1 moving: True (value=0)
m1: 45%|█████████████▍ | 0.09/0.2 [00:00<00:00, 1.73s/degrees]
[A
m1: 85%|█████████████████████████▌ | 0.17/0.2 [00:00<00:00, 1.51s/degrees]
[A
m1: 100%|███████████████████████████████| 0.2/0.2 [00:00<00:00, 1.79s/degrees]
[A[D 12:46:09.866 epics_motor:239] [ts=2019-12-26 12:46:09.86602] m1 moving: False (value=1)
[D 12:46:09.867 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1577385969.867009)
[D 12:46:09.867 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1577385969.867735)
[D 12:46:09.868 epics_motor:275] m1._move_changed(): before _done_moving() success=True value=1
[D 12:46:09.868 positioner:205] m1._callbacks[done_moving]={}
[D 12:46:09.868 positioner:206] m1._done_moving() sub_type=done_moving value=1
[D 12:46:09.868 ophydobj:396] sub_type=_req_done callback=
On the VM, when it moves properly (no stall), from just before to just after call to _done_moving()
:
[D 12:46:09.868 epics_motor:275] m1._move_changed(): before _done_moving() success=True value=1
[D 12:46:09.868 positioner:205] m1._callbacks[done_moving]={}
[D 12:46:09.868 positioner:206] m1._done_moving() sub_type=done_moving value=1
[D 12:46:09.868 ophydobj:396] sub_type=_req_done callback=<function StatusBase._finished at 0x7f8b2b3f6290>
[D 12:46:09.869 status:142] >>>>>>>>>>>>>>>>>>>>: status 111bcb23-ce0c-4953-ab65-6c92d2e06c5f._finished(): done=False success=True, kwargs={'timestamp': 1577385969.866021, 'sub_type': '_req_done', 'obj': EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])}
[D 12:46:09.870 status:111] >>>>>>>>>>>>>>>>>>>>: status 111bcb23-ce0c-4953-ab65-6c92d2e06c5f._settle_then_run_callbacks(): MoveStatus(done=False, pos=m1, elapsed=0.5, success=False, settle_time=0.0)
m1 [Complete.]
[A
[A2019-12-26 12:46:09.870931 ping 1: 0.1 0.1
[D 12:46:09.871 ophydobj:396] sub_type=_req_done callback=<function StatusBase._finished at 0x7f8b2b3f6290>
[D 12:46:09.871 status:142] >>>>>>>>>>>>>>>>>>>>: status 111bcb23-ce0c-4953-ab65-6c92d2e06c5f._finished(): done=True success=False, kwargs={'sub_type': '_req_done', 'obj': EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])}
[D 12:46:09.872 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1577385969.872612)
[D 12:46:09.873 status:123] --------------------: status 111bcb23-ce0c-4953-ab65-6c92d2e06c5f._settle_then_run_callbacks(): MoveStatus(done=True, pos=m1, elapsed=0.5, success=True, settle_time=0.0)
[D 12:46:09.873 status:155] --------------------: status 111bcb23-ce0c-4953-ab65-6c92d2e06c5f._finished(): MoveStatus(done=True, pos=m1, elapsed=0.5, success=True, settle_time=0.0)
[D 12:46:09.873 epics_motor:282] m1._move_changed(): after _done_moving() success=True value=1
When it stalls, from just before to just after call to _done_moving()
:
[D 12:46:14.430 epics_motor:275] m1._move_changed(): before _done_moving() success=True value=1
[D 12:46:14.430 positioner:205] m1._callbacks[done_moving]={}
[D 12:46:14.430 positioner:206] m1._done_moving() sub_type=done_moving value=1
[D 12:46:14.430 epics_motor:282] m1._move_changed(): after _done_moving() success=True value=1
When the motor moves smoothly, and the status object is marked done=True
so the plan continues to the next message, this chunk of code (in StatusBase._finished()
) is still running: https://github.com/bluesky/ophyd/blob/f7177f015066ed778f196af82ec700f0c59d50c2/ophyd/status.py#L139-L145
if success and self.settle_time > 0:
# delay gratification until the settle time is up
self._settle_thread = threading.Thread(
target=self._settle_then_run_callbacks, daemon=True,
kwargs=dict(success=success),
)
self._settle_thread.start()
Alternatively (to https://github.com/prjemian/bstesting/issues/1#issuecomment-569116845 above), when the motor moves on the non-VM workstation and does not stall:
[D 13:46:54.956 epics_motor:275] m1._move_changed(): before _done_moving() success=True value=1
[D 13:46:54.956 positioner:205] m1._callbacks[done_moving]={}
[D 13:46:54.956 positioner:206] m1._done_moving() sub_type=done_moving value=1
[D 13:46:54.956 ophydobj:396] sub_type=_req_done callback=<function StatusBase._finished at 0x7f7eccd305f0>
[D 13:46:54.956 status:142] >>>>>>>>>>>>>>>>>>>>: status ed9995c3-db5c-48ee-9c47-d38988088ab6._finished(): done=False success=True, kwargs={'timestamp': 1577389614.956807, 'sub_type': '_req_done', 'obj': EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])}
[D 13:46:54.957 status:111] >>>>>>>>>>>>>>>>>>>>: status ed9995c3-db5c-48ee-9c47-d38988088ab6._settle_then_run_callbacks(): MoveStatus(done=False, pos=m1, elapsed=0.5, success=False, settle_time=0.0)
m1 [Complete.]
[A[D 13:46:54.957 status:123] --------------------: status ed9995c3-db5c-48ee-9c47-d38988088ab6._settle_then_run_callbacks(): MoveStatus(done=True, pos=m1, elapsed=0.5, success=True, settle_time=0.0)
[D 13:46:54.957 status:155] --------------------: status ed9995c3-db5c-48ee-9c47-d38988088ab6._finished(): MoveStatus(done=True, pos=m1, elapsed=0.5, success=True, settle_time=0.0)
[D 13:46:54.957 epics_motor:282] m1._move_changed(): after _done_moving() success=True value=1
[A2019-12-26 13:46:54.958325 ping 3: 0.1 0.1
What code controls this ordering? Message handling the RunEngine?
Adding message handling diagnostic to RunEngine instance:
def msg_watcher(msg):
print("!"*10, f"Msg({msg})")
RE = bluesky.RunEngine({})
RE.msg_hook=msg_watcher
and this console output (when it stalls on the VM):
```
[D 14:14:08.612 ophydobj:187] This is the first instance of OphydObject. name=m1, id=140598013913872
[D 14:14:08.613 ophydobj:187] This is the first instance of OphydObject. name=m1_user_readback, id=140598013965136
[D 14:14:08.614 ophydobj:187] This is the first instance of OphydObject. name=m1_user_setpoint, id=140598013966672
[D 14:14:08.649 epics_motor:239] [ts=2019-12-26 14:14:05.61723] m1 moving: False (value=1)
!!!!!!!!!! Msg(checkpoint[None]: (None), (), {})
!!!!!!!!!! Msg(set[None]: (EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])), (0.1,), {'group': '1d12c297-59f5-4306-8601-1de8e1d25592'})
[D 14:14:08.674 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1577391248.674679)
[D 14:14:08.680 positioner:190] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move() uuid=9f80f6e8-6965-4b2b-81c9-4752203c640e
[D 14:14:08.680 epics_motor:150] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move() uuid=9f80f6e8-6965-4b2b-81c9-4752203c640e
!!!!!!!!!! Msg(wait[None]: (None), (), {'group': '1d12c297-59f5-4306-8601-1de8e1d25592'})
[D 14:14:08.682 epics_motor:239] [ts=2019-12-26 14:14:05.61723] m1 moving: True (value=0)
[D 14:14:08.683 epics_motor:239] [ts=2019-12-26 14:14:08.68067] m1 moving: False (value=1)
[D 14:14:08.683 signal:856] m1_direction_of_travel.get(): set timeout=1.000000 (1577391248.683437)
[D 14:14:08.684 signal:856] m1_low_limit_switch.get(): set timeout=1.000000 (1577391248.684723)
[D 14:14:08.686 epics_motor:275] m1._move_changed(): before _done_moving() success=True value=1
[D 14:14:08.686 positioner:205] m1._callbacks[done_moving]={}
[D 14:14:08.686 positioner:206] m1._done_moving() sub_type=done_moving value=1
[D 14:14:08.686 ophydobj:396] sub_type=_req_done callback=
It's waiting on the latest motor move to complete. Here is the Msg sequence:
!!!!!!!!!! Msg(checkpoint[None]: (None), (), {})
!!!!!!!!!! Msg(set[None]: (EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])), (0.1,), {'group': 'a42fff37-5403-4f1b-8c1f-bd4c608b8741'})
!!!!!!!!!! Msg(wait[None]: (None), (), {'group': 'a42fff37-5403-4f1b-8c1f-bd4c608b8741'})
!!!!!!!!!! Msg(sleep[None]: (None), (1e-06,), {})
!!!!!!!!!! Msg(checkpoint[None]: (None), (), {})
!!!!!!!!!! Msg(set[None]: (EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])), (-0.1,), {'group': 'ab88967f-2e35-4caa-a9b5-fdfe5f36b6b6'})
!!!!!!!!!! Msg(wait[None]: (None), (), {'group': 'ab88967f-2e35-4caa-a9b5-fdfe5f36b6b6'})
!!!!!!!!!! Msg(sleep[None]: (None), (1e-06,), {})
!!!!!!!!!! Msg(checkpoint[None]: (None), (), {})
!!!!!!!!!! Msg(set[None]: (EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])), (0.1,), {'group': '269972aa-6487-45a7-a06d-92b908372dda'})
!!!!!!!!!! Msg(wait[None]: (None), (), {'group': '269972aa-6487-45a7-a06d-92b908372dda'})
Alternatively, skip the msg_hook
and get the default logging by setting the level on the RE logger (cannot just getLogger("bluesky.run_engine")
since each RE
instance has its own logger):
RE = bluesky.RunEngine({})
RE.log.setLevel(logging.DEBUG)
So, RE is processing the wait
message, waiting on group=b059ffa8... to be done. That's not going to happen because just after the set
message was processed for group=b059ffa8..., the previous move (group=77708b8b...) status object (uuid=abfea6ab...) had its _finished()
method queued for execution:
[D 14:41:44.731 run_engine:1376] set[None]: (EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])), (0.1,), {'group': 'b059ffa8-06dd-4a49-937d-73456bd417f1'}
[D 14:41:44.733 ophydobj:396] sub_type=_req_done callback=<function StatusBase._finished at 0x7fa87e6f9ef0>
[D 14:41:44.733 status:142] >>>>>>>>>>>>>>>>>>>>: status abfea6ab-041a-4303-8656-22d1f5b1ddb4._finished(): done=True success=False, kwargs={'sub_type': '_req_done', 'obj': EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])}
[D 14:41:44.734 signal:856] m1_motor_egu.get(): set timeout=1.000000 (1577392904.734127)
[D 14:41:44.735 positioner:190] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move() uuid=19034f49-5a92-4922-9a88-13b4c5cdc78c
[D 14:41:44.735 epics_motor:150] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>: m1.move() uuid=19034f49-5a92-4922-9a88-13b4c5cdc78c
[D 14:41:44.735 status:123] --------------------: status abfea6ab-041a-4303-8656-22d1f5b1ddb4._settle_then_run_callbacks(): MoveStatus(done=True, pos=m1, elapsed=0.2, success=True, settle_time=0.0)
[D 14:41:44.742 run_engine:1376] wait[None]: (None), (), {'group': 'b059ffa8-06dd-4a49-937d-73456bd417f1'}
[D 14:41:44.736 status:155] --------------------: status abfea6ab-041a-4303-8656-22d1f5b1ddb4._finished(): MoveStatus(done=True, pos=m1, elapsed=0.2, success=True, settle_time=0.0)
@klauer, @tacaswell: Why does RE finish the previous move (why does it exit from the wait
message of group=77708b8b...) before the callbacks from the status object are returned?
It's waiting at this line in bluesky.run_engine.py
: await asyncio.wait(futs, loop=self.loop, **msg.kwargs)
I'm sure wishing I had a timeout
available.
Experimenting further by increasing the sleep time in the ping_pong plan shows that 100 cycles can be executed repeatedly on the VM without motor stall. Test with logging both ON and OFF. Minimum (empirical) sleep time for no stalls is ca. 2 ms.
Testing with sleep shorter, such as 0.2 ms results in occasional stall.
When the motor moves smoothly, and the status object is marked
done=True
so the plan continues to the next message, this chunk of code (inStatusBase._finished()
) is still running: https://github.com/bluesky/ophyd/blob/f7177f015066ed778f196af82ec700f0c59d50c2/ophyd/status.py#L139-L145if success and self.settle_time > 0: # delay gratification until the settle time is up self._settle_thread = threading.Thread( target=self._settle_then_run_callbacks, daemon=True, kwargs=dict(success=success), ) self._settle_thread.start()
Note, not this code since in our case, settle_time =0
. _finished()
just calls self._settle_then_run_callbacks(success=success)
directly.
Did some status object not get included in the group?
Reverted all the diagnostics added to an ophyd PR to help track this down.
Diagnose why the
ophyd.EpicsMotor
support stalls, as reported