create Analysis class? - Githubissues

CJ-Wright commented 7 years ago

Issue by sbillinge Tuesday Feb 09, 2016 at 14:29 GMT Originally opened as https://github.com/xpdAcq/xpdAcq/issues/49

I could envisage an analysis class then, kind of like our Beamtime class, that is a singleton in the Analysis environment (see #48).

on startup we create

anl  = Analysis(bt_uid = get_uid(bt),piLast = bt_piLast)

Then all our nice search and plot functions could be bound methods, which already pre-filter for bt_uid, so when we search for a compound, it is only searching in the bt_uid 'namespace'.

If we need to run this later to get a previous beamtime, we will have to get the bt_uid for that beamtime and pass it as the bt_uid argument. We should probably also allow the user to give 'all' as the argument so they can search over all their bt_uid's.

Passing a piLast argument is a nod to the future where we may have to control access by experimenter. In that case, we would need a uid for each experimenter rather than passing names around, but we can put this here as a kind of testing place-holder.

CJ-Wright commented 7 years ago

Comment by tacaswell Tuesday Feb 09, 2016 at 14:34 GMT

:-1:

CJ-Wright commented 7 years ago

Comment by sbillinge Tuesday Feb 09, 2016 at 14:37 GMT

@tacaswell I know this is a programmers' worst nightmare, but this is implementing experimenter workflow that makes more sense.....it is designed to make the users happy, not the programmers!

If there is a way of getting that (analysis) on the command prompt without cloning a completely new environment, I would be happy with that, but I want two separate terminals open, one that says (collection) and one that says (analysis).

Or is this a bad idea for some functional reason? My understanding was that it is ok to have multiple instances of collection running at the same time.

CJ-Wright commented 7 years ago

Comment by tacaswell Wednesday Feb 10, 2016 at 01:24 GMT

I was responding to 'analysis class' which makes me thing of code that looks like this:

A = MyAnalysiClass()
A.set_data(data)
A.set_some_other_data(data2)
A.do_a_step()
A.do_a_step()
A.do_another_step()
A.do_a_step_that _really_really_must_be_third()
A.plot()

Which is basically impossible to re-use or maintain. What you are describing is a module + some curried functions. We probably need add the ability to add 'default' query components to data broker (@danielballan).

The data broker already has a concept of a bundle of metadata ('headers' which are a RunStart, a RunStop, and a list of Descriptors) and of data streams (iterators of events), your functions should be taking in (iterables) of those things (and when sensible, returning things that look like them. ex

heads = db(bt_uid=bt_uid, sample_id=15)
curves = [reduce_images(h) for h in heads]
pdfs = [compute_pdf(c) for c in curves]
plot_pdfs(ax, pdfs)

I am also very confused by what you mean by 'enviroment'.

CJ-Wright commented 7 years ago

Comment by danielballan Wednesday Feb 10, 2016 at 03:24 GMT

I like the idea of defaults in the broker. This would address #31 and I'm sure many other use cases.

Something like

db.defaults = {'owner': ..., ...}

where db() and db[] incorporate db.defaults but override its whenever keys collide.

CJ-Wright commented 7 years ago

Comment by sbillinge Wednesday Feb 10, 2016 at 13:42 GMT

On defaults in db: Yes, I think they will be mighty powerful moving forward.

On the analysis object, if it is bad from a maintenance point of view we should revisit, but my desire for it is more from a user workflow point of view. Here is my fireside story:

We have kind of set up collection for xpd users with this persistent and friendly bt object that magically appears, even after a crash, and reassuringly returns the complete list of acquisition objects, that the user worked so hard to create, using bt.list(), and allows them to get them using bt.get(). I think the users are going to start to see that guy as their friend. Initial indications seem to suggest this is true.

I then had this vision of a nice friendly analysis object that does the same, so an.list() lists all those scans that the user has worked so hard to collect, and an.get() gets the users back the scans. But it is the same and familiar workflow that they learned from the collection....they first make a list, then they select the items they want by giving single list index, or a list of indices (it would be nice if we could offer python list slicing for pythonic users, and maybe allow less pythonic types for less pythonic users.

So

an.list()

would list all the scans from the current beamtime that pass the default filter (which would be bt_uid and is_prun = True (i.e., returns production runs and not setup runs by default). Then

an.list(myfilter)

would add an additional user-defined filter (how that works tbd)

Options for is_prun would be False or 'all', for example for only setup scans or for all scan. It may be more intuitive if we have users pass strings for all those options and use logic in the method to set the filter, but you get the drift.

Details of the behavior remain to be worked out in the sense of, how do we offer them to get all the events in a scan vs. recurse down and get individual events within a scan, but first we should decide if we want this an = Analysis() object or just functions that wrap db calls.

CJ-Wright commented 7 years ago

Comment by tacaswell Wednesday Feb 10, 2016 at 13:54 GMT

I see, that make far more sense than how I understood your OP.

CJ-Wright commented 7 years ago

Comment by sbillinge Wednesday Feb 10, 2016 at 14:10 GMT

there may be a few more methods, for example an.get(list) would pipe the actual 2D images of the items in list to the data analysis stream (currently this means writing a tiff file that xPDFsuite will find but could change in the future), but something with the name an.header(list) could return the header only of the events in the list and an.plot(list) could just open a matplotlib 2D window of the image without saving any tiff file.

A more extensible design might be an.get(list,thing=data) where the default behavior is the same as an.get(list) described above but we could extend to other "things" like thing='header', thing='plot', thing='some_other_quantity_of_interest_(QoI)'). The advantage of this design is that it is easier to extend and a cleaner interface for the user. Super cool would be if users could pass their own, on-the-fly defined QoIs to an.get. In that case it would have to take some kind of function as an argument and the QoI is what the function returns.

The disadvantage of this latter design is that it may be less intuitive for non-pythonic users.....On the other hand, we could have our cake and eat it by defining a few helper methods such as

def header(self,list):
     return self.get(list,thing='header')

With this in mind My call would be to use the an.get(list,thing) design to begin with. Thoughts?

CJ-Wright commented 7 years ago

Comment by tacaswell Wednesday Feb 10, 2016 at 16:05 GMT

having kwargs change functionality that drastically is a bit of an anti-pattern. For example, if one of the classes of 'thing' needs some extra kwargs you end up with a documentation nightmare. One high profile example of an API like 'thing' is the pandas plotting API which they are in the process of deprecating in favor of a more explicit API (df.plot(..., kind='bar') -> df.plot.bar(...)). I disagree that it is cleaner, easier to implement or extend, or more pythonic.

The way I would implement get would be something like

def get(self, lst, *args, thing=None, **kwargs):
    if thing is None:
        thing = ''data'
    mapping = {'data': self.get_data, 'header': self.get_header}
    return mapping[thing](lst, *args, **kwargs)

CJ-Wright commented 7 years ago

Comment by sbillinge Wednesday Feb 10, 2016 at 22:30 GMT

@tacaswell, thanks for the great feedback as usual!

I was actually thinking more about this myself, and I realized the following: My goal for the workflow is that the analysis workflow kind of mimics the collection workflow, so the users get used to it and comfortable. What I realize now is that in the collection workflow bt.get() returns an acquire object which the user then operates on somehow by passing it as an argument to different functions....setupscan(), prun(), dark(), and so-on. So to properly reproduce this workflow we kind of want to do the same in analysis, which is actually closer to my first design, and not my second one (that I thought I liked but you pointed out was bad!).

So an.get(item) will return either a databroker scan or a databroker event object (but let's) extend that to a list of objects) that we can pass to plot(object), analyze(object), header(object), and so on.

Long-story-short, I think we are close to convergence on this.....my new preferred design avoids the pitfall that you pointed out.

CJ-Wright commented 7 years ago

Comment by sbillinge Wednesday Feb 10, 2016 at 22:32 GMT

I will think about the second comment from @tacaswell shortly...about the _kwargs changing behavior. the dictionary is not specified as a _kwarg, but as a positional argument, but I think Tom's point that when the behavior changes depending what you pass in the dictionary can be problematic....oops, we already have that logic in our collection acquire functions....but we need to get v0.2 on the beamline tomorrow so I think we should delay major refactoring along these lines till v1.0...

CJ-Wright commented 7 years ago

Comment by tacaswell Wednesday Feb 10, 2016 at 22:44 GMT

The flip side of this is to go all-in on the 'input drastically changes behavior' and call it a dispatcher :stuck_out_tongue_winking_eye: . These are how almost all call-back systems work and if you look at the guts of the RunEngine there is a dict-lookup-and-dispatch like the get I wrote above that drives the whole thing. In that case we did this to give the ability to defer the execution of messages (which we need to be able a) inspects plans before actually running them and b) provide the wrappings that are needed manage event bundling, ctrl-c handling, garunteed clean up on failure, and the asyncio event loop in a way that end users (you) do not have to see) and went into in knowing that we were writing a mini domain specific language (DSL) on top of python/asyncio. Which is a very long winded way of saying that the dispatcher pattern is useful, but should be a sometimes food. Using it imposes a pretty high burden on users as it strays from 'core' python and makes documentation/tab complete/discover-ability more difficult (which in turn puts more burden on you in terms of user support).

CJ-Wright commented 7 years ago

Comment by sbillinge Wednesday Feb 17, 2016 at 22:02 GMT

moving to v.10

CJ-Wright commented 7 years ago

@sbillinge @chiahaoliu Is this implemented in https://github.com/xpdAcq/xpdAn/blob/master/xpdan/xpdan.py ?

Are we releasing this in this upcoming release?

If so we should test it (I don't think any lines are covered).

CJ-Wright commented 7 years ago

Closing as code base has moved far beyond this.

xpdAcq / xpdAn

create Analysis class? #20