Raise exception when redpitaya already connected to another client

SamuelDeleglise commented 5 years ago

I think the current behavior is very disturbing, especially for new users: The new client kills the server process on the Redpitaya. This destroys the connection to the old client, which in turns restarts the server process on the Redpitaya. The new client then restarts the server process on the Redpitaya.... And this is an endless loop outputing some infos in the console.

My understanding is that you want the high level control of the redpitaya to be robust to disconnects and hence, you want the client to relaunch automatically the server when the connection is lost. I suppose the other constraint is that you want to be able to kill the connection to a zombi client without needing to access the zombi computer. The problem is that at the moment, if the zombi client is still trying to reconnect whenever it is kicked out by the server, you won't be able to use the redpitaya normally, unless you kill the zombi client on the zombi computer.

Is all this really necessary ? When the connection is lost, do we really need to restart the server?
If yes, there are several options:
1. Warn the user that another client is already connected and refuse to connect.
2. Definitely kick out the zombi (for instance, a pyrpl instance would set a unique flag on the redpitaya file system, and your flag needs to be the last installed to be allowed to restart the server)

What do you think ?

lneuhaus commented 5 years ago

I agree all this has been a pain. Even more so when you have a bad connection (e.g. I have tried to use pyrpl over a 2.4 GHz wifi connection in a densely populated area with like 20% packet loss rate - no chance). It would be nice if we managed to get pyrpl running on that kind of network, as I believe that its a good testing ground for robustness. Since the current reconnection logic was implemented by trial-error-fix cycles, I propose we attack this problem by introducing all kinds of simulated connection problems into unit tests, with the aim to not break existing robustness, after which we are free to change the whole connection interface at will.

Typical test cases are e.g.:

deliberately close the ssh connection and try to read a register.
deliberately close the connection on default port 2222 and try to read a register.
both of the above together.
have a zombie process that does not access the redpitaya and try to read a register with another connection.
deliberatly break the connection on default port 2222 while communication is on-going (if possible)
write an incomplete request message to the server, break the connection, and try to read a register.

This is a bit of work though. If we find a quicker partial workaround (see below), that is fine with me, it just requires more human testing.

Is it necessary to restart the server when a connection is lost? Probably not. Pyrpl could even close the ssh connection after having installed the FPGA bitfile and started the server. Option i or ii? If we do not restart the server upon connection problems, I would favor (ii) and not bother about (i). the user should know what she is doing when starting pyrpl with a given redpitaya address. If you want to avoid that e.g. Sheon connects to your redpitaya by accident, the way to go is to change the root password (we could add a pyrpl function to facilitate this, though). The only problem that remains is how to tell the zombie client that it should not attempt to reconnect on port 2222, which is the very first thing it will (and should, I believe) attempt after a connection problem (which comes from the fact that the new client has started its own server after flashing the FPGA).

the cheap option (i.e. almost no work) is that a new client loads the FPGA and start the server on a random port. This will not work in about 1/1000 cases if we say we span a range of 1000 port numbers for this. It is also annoying w.r.t. firewalls etc.
the nicer option: you start the server process with a random authentification token that the client is free to choose. that way the old client process is simply unable to reconnect and authentify with its old token.
do you see another way around this? Your proposal of using the file system requires to talk to the redpitaya file system though the ssh connection before trying to reconnect on the port-2222-connection, which is also an option, but maybe too complicated/slow?

So in summary, I favor the attempt to

give up the attempts to restart the server after a connection problem. e.g. your option (ii),
and to ensure benign behaviour with my proposal (2).

Should we try this on a feature branch derived from develop-0.9.3?

lneuhaus commented 5 years ago

We can think about the unit tests after having a feeling of how the new proposal works

SamuelDeleglise commented 5 years ago

OK, I hadn't thought about the fact that checking things via ssh might be long... If I understand correctly what you suggest (in your option 2):

Whenever a client instantiate Pyrpl(hostname=IP), the server on the redpitaya at address IP is restarted with a token that is generated on the client computer. This means everytime a client tries to reconnect on port 2222 (because a connection was lost for instance), it has to immediately send the token, otherwise the server refuses the connection. That way, anyone having the ssh password for this redpitaya can simply restart the server with a new token, which kicks me out forever. However, reconnecting after an accidentaly lost connection is quite quick for my client (the time to establish the socket and send the token...).

That definitely seems like the right solution. I am 100% for it

lneuhaus commented 5 years ago

We can use this 32-character string as the token

from uuid import uuid1 as uid
token = uid().hex
print(token)

lneuhaus commented 5 years ago

There is a new branch with the skeleton of this architecture:

Commit: cbd52dae9cbb3605813b17de99d47ede2e7d202e
Branch: multiclient-server

@SamuelDeleglise If you are running your redpitayas on OS version 0.95 or higher, could you just run pyrpl once from this branch please? You should then find a new version of the file monitor_server. Could you commit that one? If not no problem, I just have to make an SD card with that OS in this case.

I have successfully unplugged the redpitaya ethernet cable with running scope, which make the scope stop. Upon reconnecting the cable and clicking again on run_continuous, the scope resumes.

I have also tested that when one scope is running, the start of a second one (including pyrpl instance) makes the old instance stop and only issue errors on attempts to re-start the scope. The new version does not get affected by the attempts of the old instance.

While this seems to be sufficient, we shoud take the occasion to clean up the logic, since a lot of unnecessary stuff has accumulated in redpitaya.py and redpitaya_client.py.

SamuelDeleglise commented 5 years ago

Cool. Sorry I only use OS v0.92 everywhere

lneuhaus commented 5 years ago

one problem with this approach is that it somehow breaks the older code. I will modify it to be backwards-compatible by choosing a new name instead of monitor_server for the executable. If you are unable to go back to the previous version after having tested this branch, you must run once (in the old branch, e.g. develop-0.9.3)

p = Pyrpl(config="yourconfig_or_whatever", reloadserver=True)

lneuhaus / pyrpl

Raise exception when redpitaya already connected to another client #362