Race condition in SshRequestHandler when connection is closed on the server side

s3lph commented 3 months ago

Description

We're currently working on integrating our SSH-based door lock with the Trigger app. However, we observed that most of the time, after updating the status or opening/closing the lock, the status remained at Unknown, with the toast Remote end closed connection presented.

When we adapt our server-side script to first flush stdout and then sleep a few hundred milliseconds, this issue does not occur.

We tracked the issue down to the following piece of code:

https://github.com/mwarning/trigger/blob/134a3391c1f2d69ce515ff8c16e0209783b0a6e4/app/src/main/kotlin/app/trigger/ssh/SshRequestHandler.kt#L65-L69

Expected Behavior

EOF in and by itself should not be an error condition. When both ChannelCondition.STDOUT_DATA and ChannelCondition.EOF are set, all data from stdout should be processed gracefully. This is supported by the documentation of ChannelCondition.EOF (emphasis mine):

EOF on has been reached, no more new stdout or stderr data will arrive from the remote server. However, there may be unread stdout or stderr data, i.e, STDOUT_DATA or/and STDERR_DATA may be set at the same time.

Actual Behavior

The presence of EOF is treated as an error. When stdout data is sent right before the connection is closed, so that both ChannelCondition.STDOUT_DATA and ChannelCondition.EOF are set in the same call to read, the final data from stdout is not processed.

mwarning commented 3 months ago

Thank you for the analysis. Should the exception simple been removed then? I can make a new release once an issue is fixed.

s3lph commented 3 months ago

Removing the exception indeed solves this issue. However, while digging into the code a bit further, I noticed an additional, related issue: session.execCommand is non-blocking. So when read is called after execCommand, there is no guarantee that the entire stdout has been received (or even produced) yet. This can be reproduced by using this simple script on the server side:

#!/usr/bin/env python3

import sys
import time

print('Hello')
sys.stdout.flush()
time.sleep(3)
print('World')

With this server-side script, Trigger sometimes reads the entire string, but sometimes only reads Hello and then terminates.

To capture the entire stdout, waitForCondition and read must be looped until EOF or CLOSED are encountered.

mwarning / trigger