semuconsulting / pyubx2

Python library for parsing and generating UBX GPS/GNSS protocol messages.
BSD 3-Clause "New" or "Revised" License
175 stars 68 forks source link

my get stuck in SocketStream.read() #126

Closed asherikov closed 1 year ago

asherikov commented 1 year ago

Describe the bug

May get stuck in SocketStream.read() and consume 100% CPU when used from gnssntripclient:

...
[...]te-packages/pyubx2/socket_stream.py:40    call                     => _recv(self=<pyubx2.socket_stream.SocketStream object at 0x7f3985473af0>)
[...]te-packages/pyubx2/socket_stream.py:48    line                        try:
[...]te-packages/pyubx2/socket_stream.py:49    line                        data = self._socket.recv(self._bufsize)
[...]te-packages/pyubx2/socket_stream.py:50    line                        self._buffer += data
[...]te-packages/pyubx2/socket_stream.py:53    line                        return True
[...]te-packages/pyubx2/socket_stream.py:53    return                   <= _recv: True
[...]te-packages/pyubx2/socket_stream.py:77    line                     while len(self._buffer) < num:
[...]te-packages/pyubx2/socket_stream.py:78    line                     if not self._recv():
[...]te-packages/pyubx2/socket_stream.py:40    call                     => _recv(self=<pyubx2.socket_stream.SocketStream object at 0x7f3985473af0>)
[...]te-packages/pyubx2/socket_stream.py:48    line                        try:
[...]te-packages/pyubx2/socket_stream.py:49    line                        data = self._socket.recv(self._bufsize)
[...]te-packages/pyubx2/socket_stream.py:50    line                        self._buffer += data
[...]te-packages/pyubx2/socket_stream.py:53    line                        return True
[...]te-packages/pyubx2/socket_stream.py:53    return                   <= _recv: True
[...]te-packages/pyubx2/socket_stream.py:77    line                     while len(self._buffer) < num:
[...]te-packages/pyubx2/socket_stream.py:78    line                     if not self._recv():
[...]te-packages/pyubx2/socket_stream.py:40    call                     => _recv(self=<pyubx2.socket_stream.SocketStream object at 0x7f3985473af0>)
[...]te-packages/pyubx2/socket_stream.py:48    line                        try:
[...]te-packages/pyubx2/socket_stream.py:49    line                        data = self._socket.recv(self._bufsize)
[...]te-packages/pyubx2/socket_stream.py:50    line                        self._buffer += data
[...]te-packages/pyubx2/socket_stream.py:53    line                        return True
[...]te-packages/pyubx2/socket_stream.py:53    return                   <= _recv: True
[...]te-packages/pyubx2/socket_stream.py:77    line                     while len(self._buffer) < num:
[...]te-packages/pyubx2/socket_stream.py:78    line                     if not self._recv():
[...]te-packages/pyubx2/socket_stream.py:40    call                     => _recv(self=<pyubx2.socket_stream.SocketStream object at 0x7f3985473af0>)
[...]te-packages/pyubx2/socket_stream.py:48    line                        try:
[...]te-packages/pyubx2/socket_stream.py:49    line                        data = self._socket.recv(self._bufsize)
[...]te-packages/pyubx2/socket_stream.py:50    line                        self._buffer += data
[...]te-packages/pyubx2/socket_stream.py:53    line                        return True
[...]te-packages/pyubx2/socket_stream.py:53    return                   <= _recv: True
[...]te-packages/pyubx2/socket_stream.py:77    line                     while len(self._buffer) < num:
[...]te-packages/pyubx2/socket_stream.py:78    line                     if not self._recv():
[...]te-packages/pyubx2/socket_stream.py:40    call                     => _recv(self=<pyubx2.socket_stream.SocketStream object at 0x7f3985473af0>)
[...]te-packages/pyubx2/socket_stream.py:48    line                        try:
[...]te-packages/pyubx2/socket_stream.py:49    line                        data = self._socket.recv(self._bufsize)
[...]te-packages/pyubx2/socket_stream.py:50    line                        self._buffer += data
[...]te-packages/pyubx2/socket_stream.py:53    line                        return True
[...]te-packages/pyubx2/socket_stream.py:53    return                   <= _recv: True
[...]te-packages/pyubx2/socket_stream.py:77    line                     while len(self._buffer) < num:
[...]te-packages/pyubx2/socket_stream.py:78    line                     if not self._recv():
[...]te-packages/pyubx2/socket_stream.py:40    call                     => _recv(self=<pyubx2.socket_stream.SocketStream object at 0x7f3985473af0>)
....

Version: 1.2.29, Ubuntu 22.04

To Reproduce

I suspect this is a server issue, but some sort of failure detection would be nice.

semuadmin commented 1 year ago

@asherikov

Could you provide a copy of the complete script or command you executed that resulted in this issue please? Thanks

asherikov commented 1 year ago

Sorry, cannot give you the exact parameters, but it is a pretty standard NTRIP setup, which works for other servers. Reproducible both with pygpsclient, and a simple cli utility based on gnssntripclient. It looks like the server closes connection and the client doesnt handle this situation: note that recv() does not throw on closed connections, but simply returns an empty message. So it should be fixed in pyubx2.

semuadmin commented 1 year ago

Sorry, cannot give you the exact parameters, but it is a pretty standard NTRIP setup, which works for other servers. Reproducible both with pygpsclient, and a simple cli utility based on gnssntripclient. It looks like the server closes connection and the client doesnt handle this situation: note that recv() does not throw on closed connections, but simply returns an empty message. So it should be fixed in pyubx2.

OK at the moment I'm not able to reproduce with any of the 5 or 6 NTRIP casters I routinely test with, and surprised this hasn't come up before. Happy to beef up exception handling in the SocketStream class (which is in pyubx2, pynmeagps and pyrtcm) - not a biggie - but you may need to assist with the retesting if you're not comfortable sharing your configuration (you can send it to be directly by email if you don't want it in the public domain).

Does this look like a workable fix?:

    def _recv(self) -> bool:
        """
        Read bytes from socket into internal buffer.

        :return: return code (0 = failure, 1 = success)
        :rtype: bool
        """

        try:
            data = self._socket.recv(self._bufsize)
            if len(data) == 0:
                return False
            self._buffer += data
        except (OSError, TimeoutError):
            return False
        return True
semuadmin commented 1 year ago

Hi @asherikov

Could you try installing and testing the fix-socketstream-thrash branch and let me know if it resolves the problem - thanks.

pip3 install git+https://github.com/semuconsulting/pyubx2.git@fix-socketstream-thrash
asherikov commented 1 year ago

Hi, thanks, this breaks loop in pyubx2, but now it starts looping in gnssntripclient, since it simply retries on failure:

[...]te-packages/pyubx2/socket_stream.py:50    line                        if len(data) == 0:
[...]te-packages/pyubx2/socket_stream.py:51    line                        return False
[...]te-packages/pyubx2/socket_stream.py:51    return                   <= _recv: False
[...]te-packages/pyubx2/socket_stream.py:81    line                     return b""
[...]te-packages/pyubx2/socket_stream.py:81    return                <= read: b''
[...]0/site-packages/pyubx2/ubxreader.py:307   line                  if len(data) < size:  # EOF
[...]0/site-packages/pyubx2/ubxreader.py:308   line                  raise EOFError()
[...]0/site-packages/pyubx2/ubxreader.py:308   exception           ! _read_bytes: (<class 'EOFError'>, EOFError(), <traceback object at 0x7f1dcc9e1880>)
[...]0/site-packages/pyubx2/ubxreader.py:308   return             <= _read_bytes: None
[...]0/site-packages/pyubx2/ubxreader.py:145   exception        ! read: (<class 'EOFError'>, EOFError(), <traceback object at 0x7f1dcb2bd7c0>)
[...]0/site-packages/pyubx2/ubxreader.py:187   line               except EOFError:
[...]0/site-packages/pyubx2/ubxreader.py:188   line               return (None, None)
[...]0/site-packages/pyubx2/ubxreader.py:188   return          <= read: (None, None)
[...]ages/pygnssutils/gnssntripclient.py:557   line            if raw_data is not None:
[...]ages/pygnssutils/gnssntripclient.py:559   line            self._send_GGA(ggainterval, output)
[...]ages/pygnssutils/gnssntripclient.py:358   call            => _send_GGA(self=<btk_ntrip_client.btk_ntrip_client.SerialNTRIP object at 0x7f1dcb283d00>, ggainterval=2, output=None)
[...]ages/pygnssutils/gnssntripclient.py:367   line               if ggainterval != NOGGA:
[...]ages/pygnssutils/gnssntripclient.py:368   line               if datetime.now() > self._last_gga + timedelta(seconds=ggainterval):
[...]ages/pygnssutils/gnssntripclient.py:368   return          <= _send_GGA: None
[...]ages/pygnssutils/gnssntripclient.py:554   line            while not stopevent.is_set():
[...]ages/pygnssutils/gnssntripclient.py:555   line            try:
[...]ages/pygnssutils/gnssntripclient.py:556   line            raw_data, parsed_data = ubr.read()
[...]0/site-packages/pyubx2/ubxreader.py:126   call            => read(self=<pyubx2.ubxreader.UBXReader object at 0x7f1dcb283970>)
[...]0/site-packages/pyubx2/ubxreader.py:139   line               flag = True
[...]0/site-packages/pyubx2/ubxreader.py:141   line               try:
[...]0/site-packages/pyubx2/ubxreader.py:142   line               while flag:  # loop until end of valid message or EOF
[...]0/site-packages/pyubx2/ubxreader.py:143   line               raw_data = None
[...]0/site-packages/pyubx2/ubxreader.py:144   line               parsed_data = None
[...]0/site-packages/pyubx2/ubxreader.py:145   line               byte1 = self._read_bytes(1)  # read the first byte
[...]0/site-packages/pyubx2/ubxreader.py:296   call               => _read_bytes(self=<pyubx2.ubxreader.UBXReader object at 0x7f1dcb283970>, size=1)
[...]0/site-packages/pyubx2/ubxreader.py:306   line                  data = self._stream.read(size)
[...]te-packages/pyubx2/socket_stream.py:68    call                  => read(self=<pyubx2.socket_stream.SocketStream object at 0x7f1dcb283ee0>, num=1)
[...]te-packages/pyubx2/socket_stream.py:79    line                     while len(self._buffer) < num:
[...]te-packages/pyubx2/socket_stream.py:80    line                     if not self._recv():
[...]te-packages/pyubx2/socket_stream.py:40    call                     => _recv(self=<pyubx2.socket_stream.SocketStream object at 0x7f1dcb283ee0>)
[...]te-packages/pyubx2/socket_stream.py:48    line                        try:
[...]te-packages/pyubx2/socket_stream.py:49    line                        data = self._socket.recv(self._bufsize)
[...]te-packages/pyubx2/socket_stream.py:50    line                        if len(data) == 0:
[...]te-packages/pyubx2/socket_stream.py:51    line                        return False
semuadmin commented 1 year ago

Can I just clarify what, from your perspective, the expected behaviour is here?

gnssntripclient is designed to loop indefinitely until the user explicitly stops it (via Ctrl-C), even if the NTRIP caster is sending no data. Are you looking for some kind of arbitrary 'inactivity timeout' here which halts the program if no data is received after a specified period?

asherikov commented 1 year ago

Not quite:

semuadmin commented 1 year ago

Not quite:

* I can implement inactivity timeout by myself when implementing a client based on gnssntripclient library.

* My concern is that in the case of a closed connection:

  * CPU usage is going nuts until such timeout expires
  * detection of closed connections would enable me to reconnect without waiting for timeout to expire.

Problem is AFAIK there's no foolproof, cross-platform way of distinguishing between a closed socket connection and an inactive server - in both cases, as you say, socket.recv() simply returns zero data but no exception. One could, for example, attempt to write some arbitrary packets using socket.sendall() at set intervals and see if that fails, but that may cause issues at the server end. And an automated reconnect would obviously need a retry interval and max retry count to avoid getting into a different thrashing situation.

In your specific case, do you know what's causing the NTRIP server to drop the connection?

asherikov commented 1 year ago

Ok, thanks, unfortunately I dont have feedback from the other side. I guess the most reasonable thing for me would be to check socket status by myself.

semuadmin commented 1 year ago

Ok, thanks, unfortunately I dont have feedback from the other side. I guess the most reasonable thing for me would be to check socket status by myself.

ok up entirely to you. happy to pursue further if you can't find a satisfactory solution at your end.