python-data-acquisition / meta

For laying organizational groundwork
Other
18 stars 4 forks source link

What are we trying to accomplish? #2

Open campagnola opened 4 years ago

campagnola commented 4 years ago

Our common interest in this project is that we have all worked on projects with highly overlapping functionality, we know what an enormous investment this work is, and we want to avoid any further redundant effort in the future, wherever possible. We would like to reduce fragmentation in the community of python / data acquisition developers, and offer newcomers to this space a clear starting point. How we accomplish that goal is a hard question, though.

I suggest two major goals as a starting point:

  1. A shared repository of python wrappers around manufacturer device API / protocols. These wrappers often take significant effort to fully debug, and currently they are buried in a subfolder of each of our projects where they are less likely to be discovered or reused by other projects. So the main goal here is to get that code out into the public where others can use and contribute to it. These wrappers should, as closely as possible, model their devices the same way the manufacturer models them.

  2. A hardware abstraction layer in which we decide on a shared representation and API for classes of devices (for example, all cameras implement a shared set of methods). This is a much harder problem, but also potentially much more valuable.

HazenBabcock commented 4 years ago

Can you clarify (1) a bit? Do you mean that the Python device API should match the C API as closely as possible (i.e. same function names, etc..)? Also, what would this mean for serial devices?

campagnola commented 4 years ago

To clarify (1), the point is not so much to exactly reproduce the functions in a manufacturer's C API, but rather to model the behavior of the device, as implemented by the manufacturer.

For example, let's say I have a list of functions for accessing cameras from a manufacturer. Almost every function takes a handle pointer as its first argument, referring to the camera on which to operate. It would probably be more desirable / pythonic to express these functions as methods in a class, where the handle argument would be automatically passed. So that's a case where we've broken from the literal API provided by the manufacturer, but not really changed the way the device interaction is modeled.

In contrast, let's say the manufacturer's API is designed to stream frames from the camera using a ring buffer and callbacks, but my application prefers to be able to acquire a specific number of frames in a single call. My "wrapper" then provides a method get_frames(n) as the only access to image data. In this case, I have completely reinterpreted the model of interaction provided by the manufacturer, which would make this code useless to anyone who prefers the streaming model.

So the point here is not that we accept the manufacturer's model as the best one, but rather that in the absence of an already agreed-upon best model, we can at least agree that we are less likely to make the kind of mistake described above if we just stick with the manufacturer's model.

David-Baddeley commented 4 years ago

I'm a little unsure as to the value of 1) as a goal of it's own right - writing a thin wrapper around a c-api is both relatively simple, and not all that useful (if we stick to a low-level device specific wrapper the end user needs to do almost as much work as if they had implemented the device from scratch). The real value comes from the abstractions - i.e. being able to use any camera in the same way. I'd prefer to think about from another way - get the abstraction right (which is hard), adding low level pass-throughs if and when needed. It might be that we naturally get 1) as a result of 2) and we should be conscious of designing the abstractions such that they also permit more direct access to the underlying model, but I'm not sure that spending a lot of effort up front in order to get 1) before considering 2) is worthwhile.

David-Baddeley commented 4 years ago

I'd push strongly for the principle goal being abstractions of common hardware types. Will open a new issue on what abstractions should look like. I think it'd also be worthwhile to create a document describing the abstractions in the existing packages.

campagnola commented 4 years ago

I guess (1) seems more natural for me since this is the way acq4 was designed; I would just have to copy some code out to a new repo with minor changes. There are two jobs here (interpreting the manufacturer API into something pythonic and then adapting that to a hardware abstraction layer) that usually have nicely separable, independent problems to solve. I feel like keeping these concerns separated in the code overall results in a much cleaner architecture.

Usually the former problem requires much more code and time to develop (for example: acq4's pvcam implementation has about 1000 lines devoted to wrapping the DLL, and 230 to adapt into the generic camera API). Device wrappers can be far from trivial, especially when you take into account all of the idiosyncracies of device access that can be smoothed over with a nice python wrapper. For example:

HazenBabcock commented 4 years ago

I agree with both of you..

There are some devices that are challenging to write a Python interface for such as some cameras and Thorlab's APT products, so ideally this would only have to figured out once. Many others however are serial over USB so it isn't clear what the interface would be beyond using pyserial.

However the abstraction would also be very useful. Then you could switch to a stage from a different manufacturer without having to rewrite your stage interface code. The challenge is going to be agreeing on what the stage abstraction is going to be, what it would support and what would be left out.

berl commented 4 years ago

sorry to come late to this discussion- there's a lot of good insight here.

like @David-Baddeley I see some risk if we had (1) as our only stated goal, (2) might never happen.

I've been assuming that (1) and (2) are not mutually exclusive, but emphasis on one or the other will impact where effort is placed. Is there a situation where this isn't true? Seems like it careful design of the abstraction and the relation to repo/package organization is the key issue to figure out.

In terms of impact and why we'd do this, I think that any really good implementation of (1) for a piece of hardware provides the most likely route to avoid duplication of effort for that driver. People will import a lot of stuff if there's a nice driver at the bottom that they don't have to rewrite.

campagnola commented 4 years ago

Many others however are serial over USB so it isn't clear what the interface would be beyond using pyserial.

There's an example of what I have in mind here: https://github.com/acq4/acq4/blob/develop/acq4/devices/Scientifica/scientifica.py (this implements support for stage/micromanipulator hardware). A lot went into supporting this serial device:

David-Baddeley commented 4 years ago

I guess I should add a bit of context about why I'm pushing to define the common abstraction first - ultimately there are some pretty key design choices in whatever the common abstraction is that will likely inform how the python adapters are written - if adapters are written without reference to some target abstraction we will need to write lots of shim code afterwards. This shim code may or may not be a particularly clean and/or high performance fit.

The two big issues that are at the top of my mind are:

Should control be imperative or state based? (and the somewhat related question of whether it should be synchronous or asynchronous). I.E. for a stage, should we be doing something like move_to(new_position) or rather set_attribute('position', new_position). The state-based approach has some significant advantages when it comes to automation (you can essentially describe the entire microscope state as a dictionary, and tell the microscope to assume a given state before starting acquisition - the alternative of telling each device in turn to perform an imperative action rapidly gets clunky, and can result in a poorly defined state - e.g. if you used relative moves). It also lets you easily define or autogenerate GUIs (although they tend to end up being somewhat crap, with micromanager being a case in point). The weakness of the state-based approach is that some things are intrinsically imperative (e.g. camera acquisition), and that the state based approach is potentially slower (e.g. state is only defined once a translation stage has come to rest - you can't do stuff while the stage is moving). All this can be shimmed around to some extent in an adapter, but as a lot of hardware offers both state based and imperative APIs and/or a mixture of both it probably makes sense to know what you are aiming for at the start.

How should we provide/format output data (specifically from cameras)? - This is pretty hard to get right, and my experience with running sCMOS cameras at full frame rate is that a single memcpy() (or, e.g., numpy transpose) matters in terms of final performance. If we write a number of different camera drivers before working out what the abstraction is going to look like we're probably going to end up with various bits of fudge code (and likely copies) in between the drivers and the abstraction. This might kill us when we try and stream at full frame rate.

Another relevant thought is that I'm not that interested in having a complete wrapper of the entire manufacturers API, if I only need to wrap those features I need to make my abstraction work. This also helps reduce the maintenance burden - as it stands PYME's wrappers for PI piezos and stages work directly or with minimal modification across a large range of stages because the core serial commands (MOV, POS? etc ...) are conserved across stages and we don't wrap some of the more esoteric features. If we were to map every API command or every stage it would be a lot more work.

campagnola commented 4 years ago

Definitely +1 for beginning discussion on common API, especially given your comment:

This shim code may or may not be a particularly clean and/or high performance fit.

Maybe move these to a new issue, though?

campagnola commented 4 years ago

I'm not that interested in having a complete wrapper of the entire manufacturers API, if I only need to wrap those features I need to make my abstraction work

Totally agree; I have only implemented the bits of the wrappers that I need.

campagnola commented 4 years ago

Update of my opinion on this: if the common API we settle on is relatively simple (for example, it excludes considerations like in #9 and #7), then I might lean more toward agreeing with @David-Baddeley that it's more likely we can make good progress on that in the near term, and that it would be less valuable to maintain a separation between the manufacturer-api wrapper code and the common-api implementation code.

bilderbuchi commented 4 years ago

as a drive-by comment on state-based of imperative control: I like how seamlessly properties integrate into code e.g. when using Pymeasure (but others are also doing this), so to mirror the above example it would be motor.position = 35 (or e.g. motor.position = Q(35, units.millimeter) when having units available via pint. The packages I know use a mix of this and some imperative methods where more appropriate (e.g. reset_filters()?). Lantz calls these two options`"feats" and "actions".

campagnola commented 4 years ago

I am often tempted to use Python properties, but have avoided them more recently for a few reasons:

aquilesC commented 4 years ago

(This is going to be long, @bilderbuchi is not the only one who likes writing long texts)

Just to join the discussion. I developed software for different labs, and within the same lab for different projects. What I found is that a key aspect of the software was to be able to exchange devices rather quickly. I became an advocate of the Model-View-Controller in a very strict fashion: Controllers are drivers, Models handle the logic, View handles the UI.

With this, in the models I can make different cameras or different data acquisition boards function with the same API, and thus the users can exchange them. The natural problem is that not everyone uses the devices in the same way, especially with DAQ's. Forcing an API at the 'model' level, from the perspective of a framework developer, is risky, unless that API is clearly extensible. I would keep something like a 'contrib' package as a repository of extra's that can be useful but not part of the core.

As a project, I think the best is to determine to which part of the toolchain we are targeting, and try to come up with a persona of the program. For example, Lantz is very good at wrapping drivers, it even started building into the direction of having some basic UI automatically built. But, if you want to perform a measurement, you are on your own. If you discover PyMeasure, then you have to start over because you need to develop the drivers for it. Moreover, now that more and more companies are releasing their drivers for Python, the question is whether we should wrap those drivers in Lantz or just use them (nidaqmx comes to mind).

If the persona we target is a developer, then the value a program can deliver has to be carefully considered. If you already have a solution that works, adapting to a new solution would set you back months, and this may not be what you want in academia with short-term contracts. Lacking a better description, low-level developers quickly find edge-cases, and if you need to find work arounds all the time, then whatever the program is delivering is just not worth it. At this level, I believe, having a strongly opinionated program is risky if we target adoption.

But the program could be targeting tech-savvy researchers, who know how to code (at least they can analyze data) but never dig into controlling their experiments. For them, the quicker you can build something the better. With some clear recipes you can build software for a confocal microscope using an NI board, for example. Acquire data, save data, etc. But, there is nothing new in that. Then, the next step is automating something new, for example, acquire an image, find the bright spots and automatically refocus on them (this is not a random example, but the problem that brought me into programming this kind of things).

Or, it could target end-users and therefore what needs to be developed is a solution a la uManager. In this way the program does one thing, does it well, and accepts to be extended by plugins. Most users will only see the user interface of it, very few will go deeper and add a plugin, a new driver, etc. This would be going from a framework type of solution to a program solution.

I do personally favor the intermediate situation, of developing a framework that could enable curious researchers to perform their own experiments. First, because in that realm, being highly opinionated is not a drawback. For example, data MUST be saved as HDF5, but hey! here's a viewer for it would not generate as much friction as it would if you target a developer who is a radical fanatic of netCDF. Secondly, because targeting that space we can leverage whatever has been done in other frameworks. If you already have your diver in Lantz, just plug it in. If a company releases Python drivers, we don't need to re-write them, we make them conform an opinionated API.

And, from a strategic viewpoint, I believe it is a key position. I guess by now, everyone in this repository has realized that most researchers are not hard core developers, nor want to be. They want to perform experiments, get results, etc. I, personally, would target them saying 'hey! it is not hard, here's a recipe to achieve something similar to what you want'. This 1. gets a community growing around Python for data acquisition, 2. allows us to identify common problems and therefore their solution, 3. traction allows thinking of funding.

Property-setting cannot be easily extended with new arguments the way a function call can. For example: motor.set_position((x, y), speed, acceleration, **model_specific_options)

I am working on the idea of having two types, a property with some wrappers around (unit conversion, caching, etc.) and features that take extra parameters. Properties can still be handy for settings of a device, for example the exposure time of the camera. While features are indeed more complex and can involve different settings. For example acquire_movie(exposure, frames, etc).

a misspelled property set is silently ignored: motor.postion = 30 # no effect (although we used an awkward check for this in vispy)

It can be cumbersome to implement, but using slots would be able a solution. I wonder whether it should be implemented at a metaclass level...

bilderbuchi commented 4 years ago

Yeah, I am not advocating for only using properties, but a mixture of properties, methods, and maybe even generators, as appropriate. For example, I have once implemented a controller for a motorized arm, that has (among others).

set_position(self, step, speed=None, acceleration=None)  
# this only returns once the position has been reached or an alarm is encountered, 
# and defaults to useful values

# a property for easy use
@position.setter
def position(self, step):
    self.set_position(step)

#  a generator going to a succession of steps
stepped_sweep_gen(self, stops_deg, speed=None, acceleration=None) 

So, if you just need to go to a position, you do position = 35 (and also reading back via property), if you need to control speed and acceleration, you use set_position(35, 10, 5), and if you want to just measure something at a number of positions, you can do

for position_d in motor.stepped_sweep_gen(list_of_stops):
    # only proceeds once the position has been reached
    print('Reached position (deg): %i' % position_d)
    sleep(5)  # some important measurement here
campagnola commented 4 years ago

@aquilesC, using your target-audience nomenclature: I think we are currently just targeting developers with just a shared repository of drivers and an abstraction layer to ensure similar devices are interchangeable; reason being that cooperation in this space is very difficult, and this is the lowest bar for us to reach together.

If we see some success in this project, then absolutely I would consider moving on to higher-level shared infrastructure (device management / synchronization, data management, reusable UI components, etc.) that could be used by tech-savvy researchers. The hope, then, is that a variety of applications will have been built from which the end-users can pick; the space of possible experiments that we are targeting is too large to be served well by a single application.

VolkerH commented 4 years ago

Hi, I left a brief message in the "who are we" issue a while back but have been too busy with other work to comment. I have implemented a few microscope automation tools in the past, however, none of them were generic and reusable. At some point in my career I worked in a robotics lab (not a roboticist myself) and when I left there, ROS (https://en.wikipedia.org/wiki/Robot_Operating_System) was just starting to gain traction.

Initially I thought something like ROS would be a great framework for automated microscopy, after all you have a collection of sensors (camera, PMT etc.) and actuators (stage, Z-drive, galvo, filter wheels) so in a sense an automated microscope is a robot. However, I had to work with APIs of commercial microscopes, many of which were only available for Windows and support for Windows was pretty much non-existant in ROS. Also, I had a fairly narrow focus so there was no real incentive to build a solution that works more generally.

Some of the aspects about ROS is the loose coupling of the different parts, as every robot project is different that allows great flexibility. Key to this is the public/subscribe model where messages can be passed around between different processes (potentially on different machines and running different programming languages). So you could have a camera publishing a data stream and various subscribers, one subscriber could be a live view in a GUI for example, another one could be some sort of analysis module, e.g. a YOLO like object detector. Each of these modules can send their own messages such as "interesting cell detected at location X,Y,Z" and some other processes can subscribe to such messages.

EDIT to add:

some examples of what such a framework enables are:

The message stream can be recorded and played back with tools like rosbag. This is great for debugging if something went wrong.

There are various tools that can subscribe to and visualize sensor data. In a microscope setting, the stage would broadcast the x,y,z data to a topic and a visualization tool would record it.

If some proprietary hardware can only be used with a proprietary library in a particular programming language, some small wrapper code that publishes the messages from this hardware (and subscribes to messages to be sent to that hardware) can be written without introducing complex dependencies for the whole package.

I am currently only doing image analysis and no longer automating microscopes, but I still think a framework similar in concept ROS would be great for automated microscopy. However I'm not sure whether it is worth integrating into ROS (which is rather complex and has quite a steep learning curve ... and Windows support is still lagging).

bilderbuchi commented 4 years ago

Indeed, the message-passing approach seems common. Pymeasure uses zeromq to pass messages around between the different threads in a pattern similar to what you describe.

campagnola commented 4 years ago

One of the main things we're trying to accomplish here is a shared library of hardware drivers. A question that keeps coming up is whether the low-level drivers need to know anything about the higher-level infrastructure they will participate in (like, for example, a multiprocess message-passing system).

@VolkerH, @bilderbuchi (and anyone else with an opinion here) I am curious to hear whether you think the low-level drivers need to be written with message-passing in mind, or if that can be just as easily (and perhaps more cleanly) implemented in a higher layer. We've discussed this a bit already here and here.

campagnola commented 4 years ago

I opened a new issue for discussing higher-level infrastructure here: https://github.com/python-data-acquisition/meta/issues/15

nvladimus commented 4 years ago

@aquilesC might be interested in this project, he runs Python for the Lab.

aquilesC commented 4 years ago

@aquilesC might be interested in this project, he runs Python for the Lab.

I'm already here ;-)

henrypinkard commented 3 months ago

Hi everyone, I realize this thread is a bit old, but I wanted to share a new project I'm working on called ExEngine in case anyone is interested. This isn't meant to be yet another microscopy framework, but rather an extensible, pure-Python "meta-framework" that enables mixing and matching multiple backends, and finding common areas for common abstractions where possible. The goal is to allow researchers to combine devices and functionality from different systems - for example, using Micro-Manager devices alongside custom hardware implementations. It's designed to be lightweight, customizable, and easily incorporated into existing projects, while offering benefits like improved threading and synchronization.

I'd love to get feedback from this community as the project develops. Let me know if you'd like to learn more or get involved!