semuconsulting / pyubx2

Python library for parsing and generating UBX GPS/GNSS protocol messages.
BSD 3-Clause "New" or "Revised" License
175 stars 68 forks source link

UBXReader(stream) fails silently if stream.read(n) returns < n bytes #149

Closed carstenandrich closed 6 months ago

carstenandrich commented 6 months ago

Describe the bug

Using pyubx2.UBXReader(stream) with a stream.read(n) that returns less than n bytes will silently terminate parsing, even if subsequent stream.read() calls will eventually cough up the bytes.

>>> pyubx2.version
'1.2.40'

To Reproduce

Run this minimal working example:

import io
import pyubx2

class FastReader(io.BytesIO):
    def read(self, size: int) -> bytes:
        b = super().read(size)
        print(f"read({size}) -> {b}")
        return b

class SlowReader(io.BytesIO):
    def read(self, size: int) -> bytes:
        b = super().read(min(size, 16))
        print(f"read({size}) -> {b}")
        return b

ubx = b'\xb5b\x06\x8b4\x00\x01\x07\x00\x00\x05\x00R\x10\x01\x06\x00R\x10\x00\x07\x00R\x10\x00\x02\x00R \x01\x03\x00R \x00\x04\x00R \x00\x08\x00R \x00\t\x00R \x01\x01\x00R@\x00\x96\x00\x00\x85\x01'

print("# parsing works, because io.BytesIO returns as many bytes as requested")
for raw_data, parsed_data in pyubx2.UBXReader(FastReader(ubx)):
    print(parsed_data)
print()

print("# parsing silently fails, because SlowReader returns less bytes than requested")
for raw_data, parsed_data in pyubx2.UBXReader(SlowReader(ubx)):
    print(parsed_data)

Reading from the FastReader works as expected:

# parsing works, because io.BytesIO returns as many bytes as requested
read(1) -> b'\xb5'
read(1) -> b'b'
read(4) -> b'\x06\x8b4\x00'
read(54) -> b'\x01\x07\x00\x00\x05\x00R\x10\x01\x06\x00R\x10\x00\x07\x00R\x10\x00\x02\x00R \x01\x03\x00R \x00\x04\x00R \x00\x08\x00R \x00\t\x00R \x01\x01\x00R@\x00\x96\x00\x00\x85\x01'
<UBX(CFG-VALGET, version=1, layer=7, position=0, CFG_UART1_ENABLED=1, CFG_UART1_REMAP=0, CFG_0x10520007=b'\x00', CFG_UART1_STOPBITS=1, CFG_UART1_DATABITS=0, CFG_UART1_PARITY=0, CFG_0x20520008=b'\x00', CFG_0x20520009=b'\x01', CFG_UART1_BAUDRATE=38400)>
read(1) -> b''

But reading from the SlowReader fails:

# parsing silently fails, because SlowReader returns less bytes than requested
read(1) -> b'\xb5'
read(1) -> b'b'
read(4) -> b'\x06\x8b4\x00'
read(54) -> b'\x01\x07\x00\x00\x05\x00R\x10\x01\x06\x00R\x10\x00\x07\x00'

Expected Behaviour

As per the README.md

The stream object can be any data stream which supports a read(n) -> bytes method (e.g. File or Serial, with or without a buffer wrapper).

there are no requirements for the stream.read() method. Therefore UBXReader should not expect read(n) to return exactly n bytes. Returning less bytes than requested does not indicate EOF. Only read(n) -> 0 with n>0 indicates EOF.

If this behavior is expected and required by UBXReader, then the documentation should be amended accordingly and an exception should be raised if the stream class misbehaves anyway.

Additional context

Took me a while to get to the root of this (missing some messages in my parsing results), because I was looking at UBXReader.__next__()'s raw_data return value instead of the actual raw data I was feeding into the UBXReader. Raising an exception would really have helped me to figure it out immediately.

semuadmin commented 6 months ago

Hi @carstenandrich,

So the purpose of the UBXReader.read() method is to read entire UBX, NMEA or RTCM3 messages from any binary data stream supporting a read(n) method. If it is unable to read such a message, it will return a value of (None, None). The circumstances in which it does this very much depend on the behaviour of the stream class itself and whether it provides native buffering or timeout functionality.

Your SlowReader issue can be simply resolved with the addition of an io.BufferedReader wrapper around the datastream:

print("# parsing silently fails, because SlowReader returns less bytes than requested")
for raw_data, parsed_data in pyubx2.UBXReader(SlowReader(ubx)):
    print(parsed_data)
print()

print("# ... but parsing works if the slow data stream is buffered...")
for raw_data, parsed_data in pyubx2.UBXReader(io.BufferedReader(SlowReader(ubx))):
    print(parsed_data)
# parsing silently fails, because SlowReader returns less bytes than requested
read(1) -> b'\xb5'
read(1) -> b'b'
read(4) -> b'\x06\x8b4\x00'
read(54) -> b'\x01\x07\x00\x00\x05\x00R\x10\x01\x06\x00R\x10\x00\x07\x00'

# ... but parsing works if the slow data stream is buffered...
<UBX(CFG-VALGET, version=1, layer=7, position=0, CFG_UART1_ENABLED=1, CFG_UART1_REMAP=0, CFG_0x10520007=b'\x00', CFG_UART1_STOPBITS=1, CFG_UART1_DATABITS=0, CFG_UART1_PARITY=0, CFG_0x20520008=b'\x00', CFG_0x20520009=b'\x01', CFG_UART1_BAUDRATE=38400)>

You would, of course, see the same behaviour using a standard Serial stream with a GNSS device outputting messages at the default rate of around 1Hz, e.g.

from serial import Serial
from pyubx2 import UBXReader

with Serial("/dev/ttyACM1", 38400, timeout=3 as stream:
    ubr = UBXReader(stream)
    i = 0
    for raw_data, parsed_data in ubr:
        print(i, parsed_data)
        i += 1

Setting timeout=3 will produce a continuous data stream; setting timeout=0.1 will terminate silently after the initial input buffer is cleared. The resolution is this case is simply to increase the timeout value.

More than happy to update the documentation to clarify this, but not convinced an automatic Exception is called for in these circumstances.

By way of illustration of an alternative approach (where I'm dealing with multiple data stream classes), have a look at the stream_handler._readloop() function in PyGPSClient:

https://github.com/semuconsulting/PyGPSClient/blob/master/src/pygpsclient/stream_handler.py#L219

semuadmin commented 6 months ago

Have a look at https://github.com/semuconsulting/pyubx2/commit/99e9013004c3ba127bfc1c55b78c10ced5f5c772 and let me know if it addresses your concerns. You can use the quitonerror flag to either ignore the error (0), log the error and continue (1) or terminate with an exception (2).

Traceback (most recent call last):
  File "/Users/steve/Library/Python/3.12/lib/python/site-packages/pyubx2/ubxreader.py", line 157, in read
    (raw_data, parsed_data) = self._parse_ubx(bytehdr)
                              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/steve/Library/Python/3.12/lib/python/site-packages/pyubx2/ubxreader.py", line 228, in _parse_ubx
    byten = self._read_bytes(leni + 2)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/steve/Library/Python/3.12/lib/python/site-packages/pyubx2/ubxreader.py", line 312, in _read_bytes
    raise UBXStreamError(
pyubx2.exceptions.UBXStreamError: Serial stream terminated unexpectedly. 54 bytes requested, 16 bytes returned.
carstenandrich commented 6 months ago

Thanks! Just tested with the RC-1.2.41 branch and tt does address my concerns. When you're new to the pyubx2 API this really is a footgun when not using pyserial or parsing from a file. I personally strongly disagree with pyserial's choice of read() only returning when the buffer is full or the timeout expires, which enforces inefficient patterns like reading a single byte per syscall to parse variable length data like NMEA sentences.

The following is a minimal working example for interfacing a ZED-F9P on Linux:

import io
import os
import pyubx2
import termios

def open_tty(path: str) -> io.FileIO:
    fd = os.open(path, os.O_RDWR | os.O_CLOEXEC | os.O_NOCTTY)
    attr = termios.tcgetattr(fd)
    #[iflag, oflag, cflag, lflag, ispeed, ospeed, cc]
    attr[0] = 0
    attr[1] = 0
    attr[2] = termios.CS8 | termios.CLOCAL | termios.CREAD
    attr[3] = 0
    attr[4] = termios.B115200
    attr[5] = termios.B115200
    # configure blocking read
    attr[6] = [0] * len(attr[6])
    attr[6][termios.VMIN] = 1
    attr[6][termios.VTIME] = 0
    termios.tcsetattr(fd, termios.TCSAFLUSH, attr)

    return open(fd, "r+b", buffering=0)

with open_tty("/dev/ttyACM0") as rtk:
    for offset in range(0, 256, 64):
        rtk.write(pyubx2.UBXMessage.config_poll(7, offset, [0x0fff0000]).serialize())

    while buf := rtk.read(4096):
        # do additional stuff with buf
        pass

        # parse UBX messages
        for raw, msg in pyubx2.UBXReader(io.BytesIO(buf), protfilter=7):
            print(msg)

Without your fix it yielded me this, causing significant head scratching until I figured out the root cause:

<UBX(ACK-ACK, clsID=CFG, msgID=CFG-VALGET)>
<UNKNOWN PROTOCOL(header=b'$t')>
<UBX(ACK-ACK, clsID=CFG, msgID=CFG-VALGET)>
<UNKNOWN PROTOCOL(header=b'$\x00')>
<UBX(ACK-ACK, clsID=CFG, msgID=CFG-VALGET)>
<UBX(ACK-ACK, clsID=CFG, msgID=CFG-VALGET)>

With the optionally raised Exception I can now simply read from the serial port again to append to the buffer and subsequently try UBXReader() again.

The main reason I cannot simply use

for raw, msg in pyubx2.UBXReader(io.BufferedReader(rtk), protfilter=7):
    print(msg)

is that I have to embed UBXReader in an asyncio coroutine.

semuadmin commented 6 months ago

Hi @carstenandrich,

Ok glad it was a result. You'll appreciate that pyubx2 and its companion libraries are solely concerned with platform-agnostic protocol parsing, so I deliberately leave the intricacies of serial and socket I/O to external libraries, of which pySerial (for all its foibles) is overwhelmingly the most popular. So it's probably fair to say my streaming test cases are somewhat blinkered in this direction.

Quick observation on your code snippet above, which I appreciate may only be illustrative. Sending CFG POLL requests to a UBX receiver and then reading the input stream in a linear sequence is not going to be bomb-proof in all circumstances, though you may 'get away with it' at 115200 baud and a nominal navigation interval. A more robust model would be to set up a concurrent read process (using whichever concurrent/parallel model you want to work with) first and then send the polls. Likewise dealing with CFG SET commands and their acknowledgements.

carstenandrich commented 6 months ago

You'll appreciate that pyubx2 and its companion libraries are solely concerned with platform-agnostic protocol parsing, so I deliberately leave the intricacies of serial and socket I/O to external libraries [...]

Absolutely agree with that. Just wanted to get this fixed upstream for the sake of potential other non-pyserial-users, so they won't stumble over the same pitfall.

Quick observation on your code snippet above, which I appreciate may only be illustrative.

Thanks. Of course this is just an MWE to illustrate the issue at hand.

carstenandrich commented 6 months ago

Reopening, because I just stumbled over a regression. UBXReader(stream, quitonerror=0|1) now returns a str on error:

with open_tty("/dev/ttyACM0") as rtk:
    for offset in range(0, 256, 64):
        rtk.write(pyubx2.UBXMessage.config_poll(7, offset, [0x0fff0000]).serialize())

    for raw, msg in pyubx2.UBXReader(rtk, quitonerror=0, protfilter=7):
        print(type(msg), msg)
        print(msg.identity)
<class 'pyubx2.ubxmessage.UBXMessage'> <UBX(ACK-ACK, clsID=CFG, msgID=CFG-VALGET)>
ACK-ACK
<class 'str'> Serial stream terminated unexpectedly. 326 bytes requested, 122 bytes returned.
Traceback (most recent call last):
  File ".../pyubx2_issue.py", line 33, in <module>
    print(msg.identity)
          ^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'identity'

As per README.md, it should only ever return a pyubx2.*Message:

Individual input UBX, NMEA or RTCM3 messages can then be read using the UBXReader.read() function, which returns both the raw binary data (as bytes) and the parsed data (as a UBXMessage, NMEAMessage or RTCMMessage object, via the parse() method).

semuadmin commented 6 months ago

Reopening, because I just stumbled over a regression. UBXReader(stream, quitonerror=0|1) now returns a str on error:

This is actually expected (though not documented!) behaviour.

Easily fixed, but I'll need to do a quick impact assessment on this as I suspect there may be some lazy programming in PyGPSClient which utilises the str in the parsed field so it can display 'ignored' errors on the console. In the short term, check raw is not None before any further processing as a quick workaround.

carstenandrich commented 6 months ago

Alright. I'll add a bunch of type checks. Just wanted to let you know that #150 possibly breaks API compatibility, again for the sake of other users out there. I'd suggest to update the README and docstring, because neither mention that as of right now.

semuadmin commented 6 months ago

Alright. I'll add a bunch of type checks. Just wanted to let you know that #150 possibly breaks API compatibility, again for the sake of other users out there. I'd suggest to update the README and docstring, because neither mention that as of right now.

No, it needs fixing as it was a lazy solution (and this was what the ErrorHandler callback was for). There are no other downstream impacts. Assume behaviour as per docs once RC 1.2.41 is merged.

EDIT - should be sorted in https://github.com/semuconsulting/pyubx2/pull/150/commits/8fa0bcfd27c7e3f4edb7d6805520bc9482ae6fac

carstenandrich commented 6 months ago

Works for me, but as commented in 8fa0bcf, I'd strongly recommend to use Python's built-in logging module instead of printing to stdout.