Project structure for adding new devices

Hi labscripts folks,

First, thank you all for providing such a wonderful experiment framework! Here at JQI, we are actively working on adding new devices to labscripts (strong push from @ispielma ^^). I'm currently finalizing a blacs device for DMDs using ALP 4.1 API (hopefully we can propose a general interface design for DMDs) and also a blacs device for GeniCam.

One technical difficulty I noticed is: every time I add one new device to labscript-devices repo, I will introduce a new bunch of dependencies (and sometimes, unfortunately, some binary files from the manufacturer). If some users don't really need the support of that device, these extra dependencies would just increase the difficulty of deployment.

To solve this problem, I have two solutions to propose:

Using the namespace package feature of python. With namespace packages, we can separate this huge repo into different submodules that share the same namespace, e.g. we can publish two pip packages device_a and deivce_b and they can be imported by
```
import labscript_devices.device_a
import labscript_devices.device_b
```
both under the namespace labscript_device. In this way, the dependencies of each device will be maintained independently from other devices.
I believe the namespace package is the most elegant approach, but I can also foresee that any transition from the current norm would be difficult. As an intermediate approach, we may also consider adding new devices as git submodule (one example is the way picosdk managing 3rd party libraries). We can pack all files, including binary files for that module, into an external repo and add that into labscript-devices as a submodule. When users decide to install this module, they only need to run git submodule update --remote <path to the submodule>. The downside of this method is it will mess up pip, and users need to manage the dependencies of that submodule by themselves.

Let me know your thoughts about it!

@TerryGeng some work was done a while ago to break the dependence on labscript devices. I believe it works, you can specify additional directories via the user_devices setting in the labconfig file (multiple directories can be specified as a comma delimited list).

I'm not certain if this is better than either of the options you propose...maybe it's actually just complimentary?

I wonder what the security implications would be if you just added the Python site packages folder to the user_devices setting would be... probably too risky. Any malicious package installed could be imported automatically.

Still, maybe the existing functionality could tie in with the package namespace feature? It wouldn't even have to be published under labscript_devices. You could publish a nist-labscript-devices package, add the top level package dir to the labconfig setting, and have the namespace packages contain the labscript device code (unless I've misunderstood how namespace packages work). Would be worth checking if they work with conda too...

The upside to this is that you don't actually need to reach consensus here before you begin experimenting. You can try things out in a separate package and then keep us updated as to what works and what you think the best practices are! I'd love to see maintenance of device code become more distributed 🙂

I think this is a mixture of a philosophical and practical question.

Should we bundle all of the well supported labscript devices into a single repo like we do now?

There is a clear benefit to this since users don't do do additional legwork to get a standard deployment working (rather like numpy does 90% of what is needed).
The contrasting stance is to host a repo like "labscript-suite" that has subpackages that can be downloaded a-la-carte (this is basically how Julia has opted to organize its package ecosystem).

I think either is fine. However, I think it is a very poor idea to encourage people to setup their own repos like "nist-labscript-devices." Experience with python is that this leads to orphaned projects and difficulty finding packages of use.

The problem @TerryGeng is trying to solve (in my view) is one of dependancies. If labscript-devices, as it is, were setup as a canonical python package it's install script would check for the dependancies of all of its supported devices (as it does now to some extent). I think this is (almost) always the right thing to do. However, this is in practice insane. If you don't have a physical device then you should not install the supporting packages/drivers.

So we have the question: do we build error checking into the device framework that at runtime makes sure that a device's dependancies are installed and throws a helpful error if they are not, OR go the subpackage route, OR ....?

P.S. The structure of @TerryGeng 's question also points out that the standard labscript practice of doing import * from pretty much every package is unwise.

I want to spend more time thinking about this, but I have a few thoughts I wanted to inject now to the conversation.

I'm definitely a big believer in getting user devices out there and easy to install for others. Labscript's utility derives from broad device support, so adding to that makes it easier for more users to get started. I firmly agree with Phil that moving to a more distributed model of device code maintenance would be very beneficial to this end. I also agree with Terry that namespace packages seem to be a decent way forward, though a smooth transition will probably be rather annoying. My not fully developed thought is that we should aim for a hybrid of the two methods Ian outlined:

labscript-devices will continue to exist largely as is, given it has the core devices that almost every lab relies on. I suspect the primary deciding factor of what goes here is what the core labscript team is willing to agree to maintain directly. Next concern is limiting dependency hell (ie no devices that depend on tensorflow)
we could then make a labscript-devices-aux namespace host repo/package thing that allows for more distributed maintenance and can be a-la-carte

In any case, I agree with Phil that it would be nice to have a test-bed to try out ideas that doesn't require modifying labscript-devices itself. I suspect there is going to be a lot of learning to be done regarding packaging that I would rather learn on a completely independent repo.

I have a few distinct thoughts responding to things already said that are informing this idea

every time I add one new device to labscript-devices repo, I will introduce a new bunch of dependencies (and sometimes, unfortunately, some binary files from the manufacturer).

Probably not what you mean, but I want to make sure we are being careful: I highly doubt we are allowed to host binary blobs from manufacturers directly in any repo. What this means is that installing many devices already requires tracking down 3rd party drivers (spincore for windows, all the cameras, VISA backends, etc), and the only python dependencies are their python wrapper packages which can be installed without fuss. Currently, having an unused device in labscript-devices isn't too annoying as the python dependency can always be installed and runtime checks of dependencies only occur for devices that are actually imported. A decent example of this are the FLIR cameras which have a 3rd party binary blob and a python wrapper dependency that must be installed separately outside of pip/conda. As such, even the python wrapper is not listed as a dependency of labscript-devices. If you try to use the camera, that is when dependency checks are done.

As Ian astutely points out, this does mean you should never, ever call from labscript_devices import *.

However, I think it is a very poor idea to encourage people to setup their own repos like "nist-labscript-devices." Experience with python is that this leads to orphaned projects and difficulty finding packages of use.

While this experience is definitely true, I would argue it is a feature more than a bug (as far as orphaning goes). Given most devices are written by a single grad student who will move on in life, many orphaned labscript-device projects will lose their single maintainer and will be guaranteed to rot to unusable pretty quickly. It doesn't really matter where the code is hosted, it is orphaned all the same (see the AlazarTechBoard for instance). Having orphaned code live in another repo where it is obvious it is orphaned is preferable to me. If we want to move a device to mainline maintenance, we always can. And we can always maintain a list of such repos to make them easier to find (technically exists, but is admittedly not prominent nor complete).

Another reason I prefer the separate repo approach is that many labscript-devices in use today are not written to the quality of labscript-devices. They are often coded fast and minimally functional, rather than reliable and maintainable. Encouraging independent repos (from labscript proper) significantly lowers the barrier to publish the code, which is good. Having a sanctioned and simple way to incorporate such a device into your own experiment is definitely something we should do, but I also don't want to be responsible (because the labscript-suite org owns the repo) for maintaining someone else's code for devices I don't have access to.

do we build error checking into the device framework that at runtime makes sure that a device's dependancies are installed and throws a helpful error if they are not

Maybe I'm missing a subtle distinction, but I believe this is largely already done. The camera devices are a good example. Runtime driver checks for dependencies only occur if the device is actually used within BLACS. Maybe the errors could be improved, but I think the standard 'could not import package' or 'binary not found' errors are pretty clear as to what needs to happen. There are other devices in labscript-devices that don't do as good of a job of this, but I'm pretty sure most of those are effectively orphaned devices that should be spun out of the main repo anyway. I also think some of this pain could be reduced by listing optional package dependencies more carefully.

Thanks for the good comments David. Regarding the last point, I would say that current devices do a really poor job of indicating a missing package, and do so in unique ways. For example in the IMAXdxCamear driver we have

class IMAQdx_Camera(object):
    def __init__(self, serial_number):
        global nv
        import nivision as nv
        _monkeypatch_imaqdispose()

where the nivision is the python wrapper for the NI camera code. When this is called without installing nivision, a messy python error comes back.

In Pulseblaster we have this

class PulseblasterWorker(Worker):
    def init(self):
        exec('from spinapi import *', globals())

For the AndorSolis we have

def __init__(self):
        global AndorCam
        from .andor_sdk.andor_utils import AndorCam

and then in AndorCam itself ctypes its way into the dll.

So my point is there is not actually any dependency checking going on. We are just letting python fail and relying on its messages. I am suggesting an additional tool in labscriptutils that can be called when Blacs starts to initialize a device that checks for dependancies (that are enumerated by the author of the labscript device just like in setuptools, so no crazy code introspection). This could be checks for python packages, or external dll's. Thinking of external dll's probably we should allow the user to specify their location in the connection table rather than hard coding.

Ok, I see. So there are multiple separate things here:

Package management of non-core devices (and how this ties into installation of Python dependencies)
Dependency checking inside labscript device code (that could include non-python dependencies)

I'm not opposed to developing a standard for use inside the worker process for checking dependencies. It would definitely need to do the checks in the worker process though. I don't want to break the process sandbox model we have for dependencies. I'm not sure of how complex it would need to be to support all possible dependency types either. How far could we get simply by putting some try: ... except: ... clauses around the existing imports in the worker processes, and re-raising more user friendly exception messages?

Thinking of external dll's probably we should allow the user to specify their location in the connection table rather than hard coding.

I don't think the connection table is the place for this. It doesn't need to know DLL locations, and definitely shouldn't be included in connection table comparisons when submitting shots. The local LabConfig file is available for PC specific settings. Devices are welcome to read their own section of the labconfig file if they want.

I would like to thank everyone for the beneficial discussion here :). As @ispielma and @philipstarkey have clarified, we are discussing two problems: the structure of managing installations of different blacs devices and the dependencies check for some binary blobs required by some specific devices. I want to provide more thought about the former.

I still believe the most proper way of managing device installation is to grant the user full freedom to decide which device to install, and which not to install. In this sense, the best way is to separate the entire labscript-devices repo into independent repos and namespace packages, likelabscript-device-pulseblaster, etc. And these devices are all managed by the labscript team.

This approach doesn't necessarily mean we cannot publish a pip package called labscript-devices. All we need is to just specify labscript-device-pulseblaster and other essential device packages as its dependencies, so that installing this meta-package labscript-devices will effectively download and install the whole collections of "core devices.". In fact, this change of structure will not affect any existing code of the rest of labscript and also the current installation procedure of the labscript suite.

Users can also decide not to install this standard distribution labscript-devices but, instead, hand-picking the devices he/she needs.

This will also allow other labscript users outside the labscript team to publish their own device packages, and could potentially attract some manufacturers to distribute their official labscript-device packages. Adding it to labscript team's package collection could be as easy as forking it under the name of the labscript team.

Some other comments:

I believe it works, you can specify additional directories via the user_devices setting in the labconfig file

@philipstarkey In general, I'm somewhat against the idea of using hacks from importlib to dynamically import packages. I believe that, according to the principle of python, we should leave any import and dependency management business to python itself. Especially given the fact that this user_devices is not managed by pip, I think it adds a lot of trouble for deployment. And it seems to me that this approach seems to be doing exactly the thing the namespace packages could have solved... Or did I miss anything?

the standard labscript practice of doing import * from pretty much every package is unwise.

@ispielma I totally agree ^^. I'm intentionally removing it from all code I have edited.

labscript-devices will continue to exist largely as is, given it has the core devices that almost every lab relies on.

@dihm I agree with you. While I think the idea of publishing a meta-package explained above can also fulfill this job without changing code in blacs and affecting the installation experience of users.

@TerryGeng I'll admit I am definitely leaning more toward the namespace package method, and I do really like the vision of having each device be its own package that can be individually versioned and maintained. There are a few things we probably need to work out before really trying this.

First, we need to ensure that all installation options are fully compatible with the namespace architecture. This includes the conda channel (docs are a bit slim on how to properly package for conda with namespace packages). It will also need to support running anywhere between all or none of the packages being installed in editable mode. It would be nice to find another python package that actually does these things successfully.

Second, labscript-devices is currently required by both runmanager and blacs. Those links would need to be broken to allow for an optional meta-package-like framework for the core devices.

Third, while I agree at some level that playing import hijinks is not great, that was a compromise to ensure that users writing devices did not need to fully package their code to use it. Under an extreme version of this proposal that eliminated user_devices, all users would have to successfully package each of their devices into a pip-installable namespace package, and install it with pip, just to use it locally. Personally, I think that is too much to ask. This means whatever we do will need to keep user_devices around.

Finally, this would all need to work with the device_registry framework. I'm way less familiar with the details there and how hard or easy that will be.

I also have a few lingering concerns personally. One is that it appears to be pretty easy for a namespace package to accidentally bork things (by accidental inclusions of __init__.py files for instance). So I am potentially concerned about having to endlessly trouble-shoot 3rd party namespace package issues. The second is that my casual searches so far seem to indicate that namespace packages are not especially common. Perhaps that is just because they have a specific use case, but I worry it is actually because they are annoying to do correctly. I still would really like to have a trial-run in making this work on some smaller lab's device repository to empirically answer some of these questions before converting labscript-devices itself.

labscript-suite / labscript-devices

Project structure for adding new devices #104