Closed kevinkk525 closed 5 years ago
It would be unwise to have a local server without any authentication other than sending a client_id. I'd recommend having at least a simple authentication using a username and password. I know it's not that safe as the connection is not encrypted and the data could be spoofed easily in local lan. At least it prevents the easiest ways of doing damage.
In one of my projects, I got around this limitation by placing a passphrase known to the server on the device. Any data received by the device had a sha1 checksum appended to it, the checksum was sha1(data + passphrase). If the checksum failed (i.e. an attacker tried to send something to the device), it was discarded. This way, no authentication details were passed over clear text. You would need physical access to the device to get the passphrase off of it, but by then you could just re-flash it with whatever you wanted, so I don't see that as a limitation really.
In my use case -- sending short messages related to PIR or ultrasonic sensors, packet loss is OK, but crashes / lockups are unacceptable.
One worry I have is whether all of the wifi disconnect/reconnects will cause wear on the external flash.
My two cents worth -- implement a tiny, yet robust example of one client and one server -- highlight how to solve the socket/wifi issues, without obscuring those critical details with a lot of extra features that could be implemented any number of ways.
@craftyguy that's an interesting idea but maybe a little bit of an overkill on an esp8266 with its limited resources. But could certainly be implemented as an advanced feature.
@bvwelch disconnect/reconnects don't write to the flash so I see no way it will wear on the flash. I agree, the most important goal is to have a minimal resilient code. That way this code can also easily be changed if another platform needs a different implementation. That code can then be extended in a subclass adding new features which are the same for every resilient device implementation.
@kevinkk525 My overriding aim was to minimise RAM use on the ESP8266 and to simplify the code as far as possible. I accept that it is not secure against attackers who have gained access to the LAN but I think the architecture is much more secure against internet attacks compared to running web-facing applications on the ESP8266. I wanted it to work without use of cross-compilation or frozen bytecode. I have not achieved the former. Using daily/release builds free RAM is on the order of 6KB/10KB - I feel this RAM should be for application code rather than for adding too many features. In particular I'm conscious of the RAM requirements of some sensor device drivers. Also of the fact that FBC may be a step too far for some users.
I did investigate some of the things you suggested.
I'm afraid I don't follow some of your suggestions.
I think I need to write some more demos. I think flow control and qos==2 are readily achievable at application level, but until I write some code it's just hand-waving.
I was acutely aware of the risk of re-inventing resilient MQTT. We have MQTT :) My aim was to design something which is as minimal as I could make it. I wish I could simplify it further.
My comparison with resilient MQTT is perhaps unwise. Resilient MQTT remains as a solution. The code in this repo is primarily intended for people who want to run other protocols and is therefore protocol-agnostic. But you could run Paho on the server app if you wanted to.
So the big question is, do we add features or try to minimise further? Your suggestion of adding features by subclassing seems to me to be the answer.
I also think I need to improve the docs and demos to clarify how I envisage this library being used.
Thanks for your answer. Not having to use FBC would be nice but on the esp8266 that is a really tough goal.
The task cancellation use-case is where an outage occurs when the client application is spending time doing nothing. The outage is detected by the lack of a keepalive. The other coros must be interrupted to enable the recovery process to begin. In the absence of cancellation the tasks did not terminate until the user code attempted to communicate.
Re message loss I think you haven't fully grasped how the code works. If a long outage occurs, messages aren't routinely lost. The write
method (on both client and server side) pauses until the outage ends. A single message can be lost if it is sent in the interval between the outage occurring and being detected. If write
is called with pause=False
it might be possible to lose more than one in that interval. In typical running (even in the presence long outages) messages are rarely lost.
If the client is offline for hours and the server application has accumulated many messages for it, that is really an application-level problem. It could send them or combine them in some way,
Again if the server-side application supports multiple internet protocols the application will have to arbitrate or combine them. This complexity has to be handled somewhere. My contention is that this should be done on the server in CPython. The application must do this and produce messages for each client. The messages are then sent to the appropriate Connection
instance. My library simply provides a resilient full-duplex stream interface.
You are correct in stating that newlines in the middle of messages are problematic. A message with such a newline will be received as two messages. The line-oriented protocol was chosen because MicroPython supports a socket readline
method: using this saves code on the ESP8266. In all the use scenarios I've considered, a message would consist of a JSON encoded Python object. As I understand it, such messages will not contain newlines:
>>> a = ujson.dumps([1, 'rats\nrats'])
>>> '\n' in a
False
>>> a
'[1, "rats\\nrats"]'
>>>
I dug a little deeper into the cancellation as I really wanted it gone :D What is wrong with this code compared to using cancellables? In my short tests it works as good as the cancellables but also frees 3kB of RAM as FBC after removing everything cancellable related from "primitives.py". (Code is from client.py, line 84. The decorators "@asyn.cancellable" are removed too)
async def _run(self, loop):
s = self._sta_if
while True:
while not s.isconnected(): # Try until stable for 2*.timeout
await self._connect(s)
self.verbose and print('WiFi OK')
self.sock = socket.socket()
self.evfail.clear()
try:
self.sock.connect(self.server)
self.sock.setblocking(False)
await self._send(MY_ID) # Can throw OSError
except OSError:
pass
else:
# loop.create_task(asyn.Cancellable(self._reader)())
# loop.create_task(asyn.Cancellable(self._writer)())
# loop.create_task(asyn.Cancellable(self._keepalive)())
_reader = self._reader()
loop.create_task(_reader)
_writer = self._writer()
loop.create_task(_writer)
_keepalive = self._keepalive()
loop.create_task(_keepalive)
await self.evfail # Pause until something goes wrong
self.ok = False
# await asyn.Cancellable.cancel_all()
asyncio.cancel(_reader)
asyncio.cancel(_writer)
asyncio.cancel(_keepalive)
await asyncio.sleep(1) # wait 1 sec so that all coros are removed from loop.waitq
self.close() # Close sockets
self.verbose and print('Fail detected.')
s.disconnect()
await asyncio.sleep(1)
while s.isconnected():
await asyncio.sleep(1)
You are right about message loss, my concern was about getting more messages than can be processed after a reconnect and therefore losing some messages here. I understand that the accumulation of many messages is a high level app problem just like the support for multiple protocols. Guess I should start writing application code then.
This is a different (better) approach than the one I tried and rejected which involved testing flags and quitting.
I'll study it further tomorrow but at a first glance it looks good and a worthwhile improvement. I just need to convince myself that there is no circumstance where, at the end of the 1 second delay, a cancelled coro might still be pending. The guarantee that all have actually terminated is the thing that cancel_all brings to the party.
Re message handling the assumption on the client is that there is a user coro which continuously reads: messages on the ESP8266 aren't buffered. On the server side they are. (I need to document this). But typical applications on both sides will have a coro which spends most of its time waiting for messages.
Re qos: I've written some code and done some testing of an application which implements qos==2 by the algorithm proposed in the README. It's hard to prove a negative but I experienced no missing messages through numerous outages. Duplicates do occur (as expected) but are discarded. The fact that this algorithm looks OK was an unplanned side effect of the way I defined the interface but it's very much simpler than using ACK packets. I'll document and post the code, but unless there is a subtle bug I'm convinced it should be done at application level.
Now pushed an update including the improvement from @kevinkk525, also a bugfix to server_cp.py
. Thank you Kevin.
Thanks a lot for the update! btw: you have a spelling error in README line 142 Will close this issue now and open new ones if needed.
Hello Peter, thanks a lot for your efforts! I am not completely done with the code yet but I'd like to share some thoughts with you:
Qos/Guaranteed package delivery: The connection might be resilient but as you state in the README there is no guaranteed package delivery. This would have to be implemented in the App which is not a good idea in my opinion as you would have to implement it in every possible App manually or write a general app on top anyways. The common usecase is unlikely to accept package loss and handling package loss in every app needs more effort and also results in code bloat. Therefore I'd suggest implementing QoS directly into the driver
Authentication: It would be unwise to have a local server without any authentication other than sending a client_id. I'd recommend having at least a simple authentication using a username and password. I know it's not that safe as the connection is not encrypted and the data could be spoofed easily in local lan. At least it prevents the easiest ways of doing damage.
Session: This seems to be handled in the App. In mqtt you have the option to connect clean or reuse the previous session before the wifi outage possibly resulting in a load of new messages. If the app does not handle this, the server would flood the client with new messages and crash it. This makes it either neccessary to implement some session management or the server would have to dump messages or the client takes its chances. Another option would be number 4, at least for short outages. If the outage is longer then a clean session is needed anyways as after a few hours there could be hundreds of messages.
Flow control: As the main target of this driver are microcontrollers with limited RAM it could be interesting to implement a flow control. This could be done in a very easy way, bascially like having a socket lock with a timeout:
Generic App Interface: The current implementation seems to need one client object per App. This is ok if you only use one App but how do you differ between mqtt messages, a "get" request or more? Spawning several clients has a huge overhead and would consume a lot of RAM. The solution would be to make the driver more generic by sending an additional header including an App-ID. Then server and client both know, where the message belongs to and use only one socket and driver instance for communication. An App then easily registers its callback for new messages and gets a basic set of functions for writing to the socket. This saves RAM and makes it very modular so you easily spawn a temporary App for some "GET" requests. The server would recognize the the app name and spawn a new server app for this with a uniqe App-ID. The exchanged messages would then be similar to mqtt having a bytes header first, something like: app_name (numeric), app_id, message_id, length, qos, duplicate
Cancellables I understand the use of cancellables and I did not test the difference in RAM usage between using it and having an alternative but by looking at the code, I assume that it uses a lot of RAM. Having an alternative should make the driver less RAM intense and the code easier. But again, this is just a thought as I do not have actually tested it yet.
Comparison to your mqtt_driver In the README you state that this driver uses less code than the mqtt implementation which is correct but in my opinion a weak comparison as the mqtt implementation has a lot more features and is therefore naturally more complex.
I'd be interested in your thoughts about this. I'm not sure my feedback fits the direction you were headed with the driver. I'd happily contribute to the driver and write a server-side app for mqtt once the driver protocol is decided.