sardana-org / sardana

Moved to GitLab: https://gitlab.com/sardana-org/sardana
39 stars 51 forks source link

PseudoMotor or MotorGroup never ends a movement when one of physicals unexpectedly return to MOVING state #691

Open reszelaz opened 6 years ago

reszelaz commented 6 years ago

At ALBA we have discovered a problem in the following scenario of pseudo motors:

The problem is that from time to time a sequence of very fast movements e.g. a loop of mv macros or simply a scan gets hung.

After debugging this problem we have discovered the following.

  1. When we move the energy - all three elements are set to MOVING state: bragg, energy and perp.
  2. The bragg ends the movement as the first one and it is released from the operation context and its state is set to ON. The state change triggers a chain of callbacks, among others, the energy pseudo motor callback which consults states of the physical motors (cache) and composes its own state - still MOVING cause the perp still did not finish. The bragg emits the Tango event ON but the energy stays as it was.
  3. When the motion action handles the second motor (perp) it first reads its position and triggers a chain of callbacks, among others, the exit_offset pseudo motor callback which calls the calc_pseudo method of the controller. Its particular implementation requires to read the bragg position using the PyTango device proxy. This readout, since the bragg was already released from the operation context reads the position directly from the hardware, furthermore it reads the state of the motor from the hardware to be able to report the Tango attribute quality (CHANGING or VALID). In this particular sceneraio an abnormal situation happens from time to time - it reports that the motor is in the MOVING state again, even if none had send it a request to move. It is a TurboPma2 controller and we suspect that the Sardana controller code may have a bug, or we do not understand well how the hardware behaves. But let’s continue the story...
  4. Then the perp ends the movement as the second one and it is released from the operation context and its state is set to ON. The state change triggers a chain of callbacks, among others, the energy pseudo motor callback which consults states of the physical motors (cache) and composes its own state. And now the curious thing happens. Since the read of the bragg motor from point 3 had changed the bragg state to MOVING in the cache, the energy pseudomotor state is again evaluated to MOVING even if the perp had already finished. The perp emits the Tango event ON but the energy stays as it was. And it will stay in MOVING state forever because the motion action finished here...

I believe that the same bug could be reproduced even if the exit_offset calc_pseudo would not read the position using the proxy, but another client could read the position e.g. PMTV widget in between the steps 2 and 4.

This problem was reported on the last Follow-up meeting, and now, since we already know the origin of it I report this issue just to share with you that this kind of deadlocks may happen. We have applied a local workaround, making the perp pseudomotor reading the bragg position from the Taurus cache, after a prior adding of a dummy listener to the position attribute. This forces Taurus to listen to the Tango events.

It is not sure if we can do anything in Sardana, well within the current design, in order to avoid this kind of problems. The tricky part is that the controller (plugin) by itself report that is MOVING. I propose to leave this issue open for a while so we get familiar with it but if there won’t be any ideas on how to improve it, we could close it as won’t fix. Meanwhile, @rhomspuron will investigate if there is any bug in the controller and hopefully report here any news about it:) Thanks!

reszelaz commented 5 years ago

This taurustrend of the measurement group and the physical motor state run in parallel to a step scan. It polls the state attributes with a very high frequency (every 10ms): mg_missing_event

The blue areas are the measurement group acquisitions. We can already see that the second acquisiton on the trend happens while the physical motor changes its state - this should not happen. Fortunatelly the scan was not affected. However the move after the fifth acquisition does not finish, and the scan hangs. We can see that after these move, during a while, the motor state fluctuates between ON and MOVING.