lneuhaus / pyrpl

pyrpl turns your RedPitaya into a powerful DSP device, especially suitable as a lockbox in quantum optics experiments.
http://lneuhaus.github.io/pyrpl/
MIT License
139 stars 108 forks source link

Raise exception when redpitaya already connected to another client #362

Open SamuelDeleglise opened 5 years ago

SamuelDeleglise commented 5 years ago

I think the current behavior is very disturbing, especially for new users: The new client kills the server process on the Redpitaya. This destroys the connection to the old client, which in turns restarts the server process on the Redpitaya. The new client then restarts the server process on the Redpitaya.... And this is an endless loop outputing some infos in the console.

My understanding is that you want the high level control of the redpitaya to be robust to disconnects and hence, you want the client to relaunch automatically the server when the connection is lost. I suppose the other constraint is that you want to be able to kill the connection to a zombi client without needing to access the zombi computer. The problem is that at the moment, if the zombi client is still trying to reconnect whenever it is kicked out by the server, you won't be able to use the redpitaya normally, unless you kill the zombi client on the zombi computer.

What do you think ?

lneuhaus commented 5 years ago

I agree all this has been a pain. Even more so when you have a bad connection (e.g. I have tried to use pyrpl over a 2.4 GHz wifi connection in a densely populated area with like 20% packet loss rate - no chance). It would be nice if we managed to get pyrpl running on that kind of network, as I believe that its a good testing ground for robustness. Since the current reconnection logic was implemented by trial-error-fix cycles, I propose we attack this problem by introducing all kinds of simulated connection problems into unit tests, with the aim to not break existing robustness, after which we are free to change the whole connection interface at will.

Typical test cases are e.g.:

This is a bit of work though. If we find a quicker partial workaround (see below), that is fine with me, it just requires more human testing.

Is it necessary to restart the server when a connection is lost? Probably not. Pyrpl could even close the ssh connection after having installed the FPGA bitfile and started the server. Option i or ii? If we do not restart the server upon connection problems, I would favor (ii) and not bother about (i). the user should know what she is doing when starting pyrpl with a given redpitaya address. If you want to avoid that e.g. Sheon connects to your redpitaya by accident, the way to go is to change the root password (we could add a pyrpl function to facilitate this, though). The only problem that remains is how to tell the zombie client that it should not attempt to reconnect on port 2222, which is the very first thing it will (and should, I believe) attempt after a connection problem (which comes from the fact that the new client has started its own server after flashing the FPGA).

  1. the cheap option (i.e. almost no work) is that a new client loads the FPGA and start the server on a random port. This will not work in about 1/1000 cases if we say we span a range of 1000 port numbers for this. It is also annoying w.r.t. firewalls etc.
  2. the nicer option: you start the server process with a random authentification token that the client is free to choose. that way the old client process is simply unable to reconnect and authentify with its old token.
  3. do you see another way around this? Your proposal of using the file system requires to talk to the redpitaya file system though the ssh connection before trying to reconnect on the port-2222-connection, which is also an option, but maybe too complicated/slow?

So in summary, I favor the attempt to

Should we try this on a feature branch derived from develop-0.9.3?

lneuhaus commented 5 years ago

We can think about the unit tests after having a feeling of how the new proposal works

SamuelDeleglise commented 5 years ago

OK, I hadn't thought about the fact that checking things via ssh might be long... If I understand correctly what you suggest (in your option 2):

Whenever a client instantiate Pyrpl(hostname=IP), the server on the redpitaya at address IP is restarted with a token that is generated on the client computer. This means everytime a client tries to reconnect on port 2222 (because a connection was lost for instance), it has to immediately send the token, otherwise the server refuses the connection. That way, anyone having the ssh password for this redpitaya can simply restart the server with a new token, which kicks me out forever. However, reconnecting after an accidentaly lost connection is quite quick for my client (the time to establish the socket and send the token...).

That definitely seems like the right solution. I am 100% for it

lneuhaus commented 5 years ago

We can use this 32-character string as the token

from uuid import uuid1 as uid
token = uid().hex
print(token)
lneuhaus commented 5 years ago

There is a new branch with the skeleton of this architecture:

@SamuelDeleglise If you are running your redpitayas on OS version 0.95 or higher, could you just run pyrpl once from this branch please? You should then find a new version of the file monitor_server. Could you commit that one? If not no problem, I just have to make an SD card with that OS in this case.

I have successfully unplugged the redpitaya ethernet cable with running scope, which make the scope stop. Upon reconnecting the cable and clicking again on run_continuous, the scope resumes.

I have also tested that when one scope is running, the start of a second one (including pyrpl instance) makes the old instance stop and only issue errors on attempts to re-start the scope. The new version does not get affected by the attempts of the old instance.

While this seems to be sufficient, we shoud take the occasion to clean up the logic, since a lot of unnecessary stuff has accumulated in redpitaya.py and redpitaya_client.py.

SamuelDeleglise commented 5 years ago

Cool. Sorry I only use OS v0.92 everywhere

lneuhaus commented 5 years ago

one problem with this approach is that it somehow breaks the older code. I will modify it to be backwards-compatible by choosing a new name instead of monitor_server for the executable. If you are unable to go back to the previous version after having tested this branch, you must run once (in the old branch, e.g. develop-0.9.3)

p = Pyrpl(config="yourconfig_or_whatever", reloadserver=True)