paulscherrerinstitute / StreamDevice

EPICS Driver for message based I/O
GNU General Public License v3.0
28 stars 42 forks source link

Protocol hangs after trying to read on a disconnected port #22

Closed chrschroeder closed 5 years ago

chrschroeder commented 5 years ago

A protocol seems to hang, if a StreamDevice 'in' command is initiated on a disconnected port. Something like this might happen if the target server / device is offline. Here some simplified example:

sendTest {
    out "%#s";
}

rbkTest {
    in "%#s";
}
record(stringout, "DIODETEST2:sendTest") {
    field(DTYP, "stream")
    field(OUT,  "@test.proto sendTest() DIODETEST2 -1")
    field(FLNK, "DIODETEST2:rbkTest")
}

record(stringin, "DIODETEST2:rbkTest") {
    field(DTYP, "stream")
    field(INP,  "@test.proto rbkTest() DIODETEST2 -1")
}

To reproduce this I simulated a disconnected device by deactivating autoConnect and disconnecting the asyn port:

2018/11/26 16:08:17.709256 CAS-client DIODETEST2:sendTest lockRequest: pasynManager->queueRequest: port DIODETEST2 not connected 2018/11/26 16:08:17.709281 CAS-client DIODETEST2:rbkTest readRequest: pasynManager->queueRequest: port DIODETEST2 not connected

This is expected, but after I reconnected the port and send another command, I only got a recovery message for the output record:

2018/11/26 16:08:24.893359 CAS-client DIODETEST2:sendTest lockRequest: pasynManager->queueRequest: status returned to normal

I could see the answer being read by asyn though asyn debug, but it did not appear in the input record. A recovery of the record could be done with streamReload:

epics> streamReload('DIODETEST2:rbkTest') 2018/11/26 16:09:39.328291 main DIODETEST2:rbkTest: Protocol aborted

This seems to point out, that the protocol was hanging all the time without running into a timeout. Initially I discovered the same problem while using waveform records. I used stringin and stringout for simplification. I couldn't reproduce the same behavior for a pair of ao and ai, so it might be something only related to strings / arrays.

The test was done with version 2.7.7 and 2.8.7

dirk-zimoch commented 5 years ago

Thanks for the bug report. I will have a look as soon as possible. Please also tell me the asyn and EPICS base versions you are using.

dirk-zimoch commented 5 years ago

I assume your asyn port is a TCP port?

chrschroeder commented 5 years ago

In one setup it is a TCP port configured with drvAsynIPPortConfigure and in another one I used drvAsynSerialPortConfigure with a local serial device. edit: sorry I missed the other question. If this still matters: I discovered the issue with EPICS base 7 / asyn R4-30 and reproduced it with EPICS base 3.14.12.7 / asyn R4-32.

dirk-zimoch commented 5 years ago

Confirmed. The input record prints an error (var streamError 1) but does not go into error state and its PACT field stays 1. Thus it has never finished processing.

dirk-zimoch commented 5 years ago

Should be fixed now in commit acf7efcf. It happens when the protocol starts with an 'in' command and the record is not scanned "I/O Intr". This is not a very common use case. Thus the bug went undetected for a long time.

chrschroeder commented 5 years ago

That fixed the problem. In this case I also thought about using I/O Intr and will probably use it here, but sometimes it is preferable to process explicitly. For example if you have 2 records, which get input of the same pattern. In that case I prefer to process with FLNK at the time I expect the appropriate answer.

Thanks for the fast reaction!

dirk-zimoch commented 5 years ago

In such cases I usually use redirect or read the reply back into the same record. Using 2 protocols brakes the relation between question and answer. Then the bus is not locked any more and other records may process in between. Then you may get a reply that is not from the request you think. So don't do that. Always keep question and answer together in the same protocol.

As this fixes the bug, I tag the commit and close the issue.