Launch BLACS as a secondary control system

labscript-suite / blacs

𝗯𝗹𝗮𝗰𝘀 supervises the execution of experiments controlled by the 𝘭𝘢𝘣𝘴𝘤𝘳𝘪𝘱𝘵 𝘴𝘶𝘪𝘵𝘦. It manages experiment queuing and hardware-timed execution, and provides manual control over devices between shots.

http://labscriptsuite.org

Other

4 stars 47 forks source link

Launch BLACS as a secondary control system #13

Open philipstarkey opened 7 years ago

philipstarkey commented 7 years ago

Original report (archived issue) by Philip Starkey (Bitbucket: pstarkey, GitHub: philipstarkey).

We should be able to launch secondary instances of BLACS on other PCs. This will be particularly useful for a future port of BIAS to Python, but should be open to all devices.

Proposal

#!python

@labscript_device
class SecondaryControlSystem(Device):
    delimeter = '|'

    @set_passed_properties(
        property_names = {"device_properties": ["local_hostname"]}
    )
    def __init__(self, name, local_hostname, port):
        Device.__init__(self, name, None, None)
        self.BLACS_connection = str(port)

    def __call__(self, port):
        return '%s%s%s'%(self.name, self.delimeter, port)

The above code allows you to both instantiate a secondary control system and then use that to modify the BLACS_connection property of a device. Example usage would be:

#!python

SecondaryControlSystem('imaging_pc', 'bec-imaging.physics.monash.edu', 42518)
SecondaryControlSystem('control2', 'bec-control2.physics.monash.edu', 42518)
PulseBlaster('pulseblaster_0', board_number=imaging_pc(0))
NovaTechDDS9M('novatech1', my_clockline, com_port = control2('com14'))

BLACS would then be modified so that:

the primary instance instantiated a communications tab (similar to the BIAS camera tab we have now) for each secondary control system instantiated in the connection table (the above code already adds a line to the connection table for this)
The primary instance does not instantiate tabs for devices whose BLACS connection starts with the name of one of the secondary control systems
the QueueManager knows to communicate with secondary BLACS instances in order to transition_to_buffered, etc. for relevant devices
secondary BLACS instances know (based on hostname in connection table? or lab config?) to use a different QueueManager designed specifically to talk to a primary BLACS instance
secondary BLACS instances create device tabs for devices in the connection table that begin with the secondary control instances name.

This is related to, but not the same as, labscript #28 (labscript-suite/labscript#28). We could use a similar syntax to that issue instead (such as secondary_control_host='blah' as a kwarg for all devices) however that does not then provide information in the connection table for how to launch the secondary instances of BLACS, or which port they should communicate on. Similarly, we could consider that the syntax in that issue could be of a similar form to what I've proposed here, but I'm not sure that makes a lot of sense.

philipstarkey commented 7 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).

I like the syntax.

I was thinking of having some kind of "labscript suite daemon" that would be running all the time on all computers that need it, listening for instructions and capable of launching programs on demand. As a first port of call, it could be used to implement labscript issue #28 (labscript-suite/labscript#28), where it would just launch the worker subprocesses and forward the zmq port numbers back to BLACS (which would then communicate with the workers directly). If we want the actual GUIs of the tabs to be on another computer as well, which I think is basically the idea you're getting at with "secondary control systems", then we need to sharply define the interface between BLACS and its tabs, and make that go over a network as well. The secondary system could read the connection table yes, or we could make it that primary BLACS sends it all the information it needs. Either would be fine. I think it should probably be fairly tightly under BLACS's control - when BLACS restarts it will need to as well. So I think it's more like just another subprocess, but one running on a different computer.

However if both features are to be implemented - the GUIs on different computers or the worker processes on different comptuers, we need a syntax that lets you specify them both. Like, primary BLACS is on computer A, but we want the GUI for some devices to be on computer B, but then the actual hardware is on computer C.

philipstarkey commented 7 years ago

Original comment by Philip Starkey (Bitbucket: pstarkey, GitHub: philipstarkey).

Suggestion for integrating with remote worker processes. Since the proposed syntax allows the existing SecondaryControlSystem calls to be chained, we can just create another class

#!python

@labscript_device
class SecondaryControlSystem(Device):
    delimeter = '|'

    @set_passed_properties(
        property_names = {"connection_table_properties": ["internal_hostname"]}
    )
    def __init__(self, name, internal_hostname, external_address):
        Device.__init__(self, name, None, None)
        self.BLACS_connection = str(external_address)

    def __call__(self, port):
        return '%s%s%s'%(self.name, self.delimeter, port)

@labscript_device
class RemoteWorkerBroker(Device):
    delimeter = '|'

    @set_passed_properties(
        property_names = {"connection_table_properties": ["internal_hostname"]}
    )
    def __init__(self, name, internal_hostname, external_address):
        Device.__init__(self, name, None, None)
        self.BLACS_connection = str(external_address)

    def __call__(self, port):
        return '%s%s%s'%(self.name, self.delimeter, port)

Note: It should be pointed out (because I had to go back over my previous work on this to figure it out) that local_hostname (in the original post) should match what is returned by socket.gethostname() on the remote PC (whether that PC be running the secondary control system or the remote worker broker). I've renamed it internal_hostname for clarity. external_address is how the controlling device (whether that be a primary or secondary BLACS instance) communicates with the remote system (and should probably specify both a hostname and port). The port the secondary BLACS and remote workers launch on should probably be defined in the labconfig file, and don't have to match the port specified in internal_hostname in order to support communication through a NAT. I've also changed where this is stored (into the connection table properties) to avoid creating unnecessary device groups.

internal_hostname is used solely for the secondary BLACS or remote worker to determine the labscript device name that corresponds to it, so that it can identify which devices it is controlling via BLACS_connection column in the connection table.

Finally, I advocate for this syntax primarily because it means the BLACS_connection string contains everything you need to know about the chain of communication (and does not rely on looking at various arguments in the connection table properties). It also allows you to (theoretically) create chains longer than primary and secondary or primary and worker (you could have primary -> secondary -> secondary -> secondary -> worker) and provides a unified syntax for the features in this issue and labscript issue #28 (labscript-suite/labscript#28)

philipstarkey commented 7 years ago

Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd).

Ok so having a managing instance for connections seems fine to me.

I'd make them something different from devices but that just personal preference.

Is there a reason that RemoteWorkerBroker and SecondaryControlSystem don't subclass something like a "ConnectionManager"? They seem to be identical to me. Subclassing would allow for finding all of them in the connection table with isinstance(ConnectionManager). But that might just be overthinking it.

The rest of my questions I solved myself by thinking about them a bit longer.

philipstarkey commented 7 years ago

Original comment by Philip Starkey (Bitbucket: pstarkey, GitHub: philipstarkey).

The reason they are devices is because that's currently the only way to get them in the connection table. I guess that might change down the line (because you're right, these are not devices).

Yes, they should probably inherit from something as they're the same (they weren't initially but I guess I made them converge!). I'm not sure if they'll actually be instances though when you'd want to do that check, but we may as well simplify the code a bit.

I'm thinking of ditching the "internal address" kwarg as well. I was going to use that so that secondary control systems/remote brokers could dynmically determine which line in the connection table they were by comparing the local hostname to one in the connection table. However, that limits it to one per host, and gets messy if you are actually going through NAT and thus may have multiple PCs with the same hostname (and there are other network topologies that allows for duplicate hostnames I think). We should probably just require this to be explicitly configured by an option in the labconfig on the remote host (or maybe BLACS config/remote broker config) that says which device name they correspond to since those are unique.

So we have:

#!python

@labscript_device
class RemoteBLACSConnection(Device):
    delimeter = '|'

    @set_passed_properties(
        property_names = {}
    )
    def __init__(self, name, external_address):
        Device.__init__(self, name, None, None)
        # this is the address:port the parent BLACS will connect on
        self.BLACS_connection = str(external_address)

    def __call__(self, port):
        """ This modifies the connection string so that a parent BLACS knows not to instantiate it directly"""
        return '%s%s%s'%(self.name, self.delimeter, port)

@labscript_device
class RemoteWorkerBroker(RemoteBLACSConnection):
    pass

@labscript_device
class SecondaryControlSystem(RemoteBLACSConnection):
    pass

philipstarkey commented 7 years ago

Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd).

Yeah the instances thing was just a thought but most probably we will only check the name strings anyway.

Ditching internal address is fine with me I saw no real use in that anyway.

Where do we store the network address and port of the remote system, that devices are actually attached to? I'd opt for this being a connection table property of RemoteBLACSConnection. This is because I don't think RemoteWorkerBroker needs a tab in BLACS. We are currently using the remote worker feature in our lab already(loads of PCI cards and just 5 Slots in the main machine) and everything looks and works just like the devices are local. I think this is a good abstraction that I would like to keep.

philipstarkey commented 6 years ago

Original comment by Philip Starkey (Bitbucket: pstarkey, GitHub: philipstarkey).

The network address and port for the remote system is what external_address is and would be stored in the BLACS_connection column of the connection table row for the RemoteWorkerBroker or SecondaryControlSystem device (so, RemoteBLACSConnection should not be used directly and should never appear in the class column of the connection table).

I agree that RemoteWorkerBroker should not have a tab (as one already exists for the device) but I think that SecondaryControlSystem should have a tab so you can see which devices are running on the secondary control system (and possibly even restart them remotely)

philipstarkey commented 6 years ago

Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd).

Ok sure thanks for the clarification. I think we are all set then :)

philipstarkey commented 6 years ago

Original comment by Philip Starkey (Bitbucket: pstarkey, GitHub: philipstarkey).

TODO list for launching BLACS as a secondary control system

Add an additional property to labconfig allowing the secondary BLACS control system name to be specified (to match the device name for the SecondaryControlSystem as defined in the labscript connection table)
Modify connections.py to only return attached devices (via ConnectionTable.get_attached_devices()) based on the BLACS_connection property of the connection table matching the current instance of BLACS (aka primary or the name specified in the labconfig file). Ensure that entries with class SecondaryControlSystem are also returned (for primary BLACS, this is all secondary control system entries where there is no delimeter in the BLACS connection (which corresponds to external address of the secondary control system), and for secondary BLACS, this is any secondary control system entries where the -2 element (when splitting by delimeter) of the BLACS_connection is equal to the secondary control system name).
Create a tab and worker process for secondary control systems (should live in BLACS repository) and ensure that this class is used when instantiating tabs (it's a special case because it is not in labcript_devices). This worker also needs to help synchronise ZMQ events (e.g. those related to waits)
Wait for plugin enable/disable manager to be created/tested/merged
Expand plugin system so that plugins can report they are managing a particular aspect of BLACS (for example, the queue). Have BLACS specify a limit to the number of plugins that can handle that aspect of BLACS (for queue, this would be 1) and raise an error if more than this number of plugins are active
Move ZMQ server from __main__.py to queue.py
Merge queue.py and analysis_submission.py into a plugin
Make a new plugin for handling communication from primary BLACS, that can be enabled in secondary BLACS, and manages the programming of devices (as well as forwarding of ZMQ events...e.g. when waits are done).