labrad / scalabrad-web

A web interface for labrad
MIT License
8 stars 7 forks source link

Grapher could support computed traces as a display option. #181

Open btchiaro opened 8 years ago

btchiaro commented 8 years ago

It would be really useful if the grapher could plot computed traces. For example in the ramsey function we store the envelope as a data column, although this is redundant data. Another example is in the rapid RTO, the raw output of a 1,0 timeseries is not human readable and not worth displaying in the grapher. It would be nice to save the raw binary data, but have the grapher plot the spectrum after Fourier analysis.

btchiaro commented 8 years ago

Bump. I just deleted a computed trace from a scan that I often use and merged this change to master. This measurement takes a lot of time and it is useful to watch the data come in. This would be a really useful feature.

joshmutus commented 8 years ago

Yeah, this would be a really useful feature, but will take serious thinking to implement properly as it might need changes to the datavault server, the format of storage on the datavault and the grapher itself.

@btchiaro what is the actual formula for the data you would want to have plotted? There may be a way to add part of this feature on the frontend, but it would be nice to know what you need to plot.

btchiaro commented 8 years ago

I'm storing the data from each tomography phase as a data column, but what I want to plot is the envelope value. So the formula I have in mind is envelope = np.sqrt((data[0]-data[2])2 + (data[1]-data[3])2). We don't want to just store the envelope to the dataset because it is redundant.

On Wed, Jun 1, 2016 at 8:31 PM, Josh Mutus notifications@github.com wrote:

Yeah, this would be a really useful feature, but will take serious thinking to implement properly as it might need changes to the datavault server, the format of storage on the datavault and the grapher itself.

@btchiaro https://github.com/btchiaro what is the actual formula for the data you would want to have plotted? There may be a way to add part of this feature on the frontend, but it would be nice to know what you need to plot.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/labrad/scalabrad-web/issues/181#issuecomment-223186924, or mute the thread https://github.com/notifications/unsubscribe/AIeWl2lmsvNviVrIXyKQ_ZUa46FVmUqCks5qHk6sgaJpZM4Il-yo .

DanielSank commented 8 years ago

@joshmutus I think we could do this by adding properties to the dataset. No need for modifications to the datavault server. This is just a rendering issue.

It seems totally reasonable to read a parameter from the dataset which could even just be a bit of js that says how to compute the extra curves.

If we don't like allowing arbitrary code to be run from user data (which does indeed sound kinda jank) we could parametrize some of the most common curves.

btchiaro commented 8 years ago

Another example is when I make the noise spectrum measurements, the raw 1,0 output is not human readable at all. It would be nice to save the raw 1,0 time series, but plot the power spectral density using our code in pyle. Ideally, we could access whatever processing parameters are exposed in the pyle functions e.g. 'frequency_average' for the rapid rto data. This would be extremely useful to me. This is also kind of an extreme case since this type of processing requires other datasets as inputs (spectroscopy_z_func data to calibrate frequency noise to flux noise) I'm not sure I like using a dataset parameter, I think we want this to be mutable. I can easily imagine people wanting to plot the data in different ways, for example as above. I think it would be really good to allow the user to call pyle processing code for the plotting.

joshmutus commented 8 years ago

The idea is that you could have a dataset parameter that would dictate the default way it was plotted. You could still do whatever you wanted with it later.

DanielSank commented 8 years ago

but plot the power spectral density using our code in pyle

@btchiaro It sounds like you want to put the pyle analysis code into the web browser. Is this because you're looking for a graphical intuitive way to browse through your processed data? Let's try to distill the thing you want without the context of the grapher, and then we can decide how to implement it.

joshmutus commented 8 years ago

@btchiaro we should probably chat about this Friday after group meeting. I can't think of an easy way to implement pyle in a browser.

btchiaro commented 8 years ago

@DanielSank I'd like to be able to watch my data come in in some human readable format. The grapher is a nice, organized, easy access data repository and real-time viewer. If we're going to be more strict about having computed traces stored with the datasets than it would be nice to have the grapher be able to generate them, but I don't know where to draw the line on what the grapher should be able to do. It sounds like the case that I mentioned with showing the Ramsey envelope computed from the stored tomography data should be doable so that would be a great start. Displaying the noise data in a useful form seems more difficult. The raw noise data is not human readable at all, so it would be nice to be able to view spectra in the grapher, either as it comes in or some time after the fact. However, It sounds like to processing required to make the data human readable is just too much to put into the grapher. I suppose I could do the data processing outside the grapher and write additional processed files to the datavault, but this seems like it would violate data redundancy best practices. If what I want is to see properly scaled noise spectra in the grapher, what do you think is the best way for me to get there? What is the major challenge with having the grapher call arbitrary logic from, say, pyle to generate the displayed data?

On Wed, Jun 1, 2016 at 9:35 PM, Josh Mutus notifications@github.com wrote:

@btchiaro https://github.com/btchiaro we should probably chat about this Friday after group meeting. I can't think of an easy way to implement pyle in a browser.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/labrad/scalabrad-web/issues/181#issuecomment-223193287, or mute the thread https://github.com/notifications/unsubscribe/AIeWl5VsSEw2jOwB7aTXM3deJFcbhcdwks5qHl2PgaJpZM4Il-yo .

DanielSank commented 8 years ago

This is an interesting question! I think @btchiaro is saying that it would be nice to easily associate processed data and plots with the raw data entries in the data vault, and view them in the same application as the grapher. Right now I'm not really sure how to do that, although I do have an interesting idea: put the analyzed data in Drive (slides or otherwise) and link to that document in the comments box in the grapher.

At this point I would like to formally say to @maffoo that he was right about hyperlinks being a good reason to rewrite the grapher using the web. You were right and I was wrong :)

jwenner commented 8 years ago

Instead of dealing with the difficulty (and security risk) of running arbitrary (possibly Python) code in the grapher, I would propose the following:

  1. For after-the-fact, either follow Dan's suggestion above or do as Ben said of writing the processed data to the data vault (with a comment to link the raw and processed data sets). Although, do we still have a comments box? I know we had one in the Delphi grapher...
  2. For realtime processing, I would propose to do the plotting using strictly pyle. The data would then be fetched using the DataVaultWrapper (although we would need to ensure that refreshing the cache works - see martinisgroup/pyle#1285).
btchiaro commented 8 years ago

I think that there is a pretty limited security risk if we restrict the code to pyle/master. What if the grapher had a checkout of pyle/master and you could specify functions to be called on the data from functions in that repo? My mention of writing separately processed data seems bad from a redundancy point of view (Eliminating redundant traces is actually the motivation for this issue). Being able to put hyperlinks as dataset parameters could provide some interesting opportunities though.

On Mon, Jun 6, 2016 at 2:12 PM, Jim Wenner notifications@github.com wrote:

Instead of dealing with the difficulty (and security risk) of running arbitrary (possibly Python) code in the grapher, I would propose the following:

  1. For after-the-fact, either follow Dan's suggestion above or do as Ben said of writing the processed data to the data vault (with a comment to link the raw and processed data sets). Although, do we still have a comments box? I know we had one in the Delphi grapher...
  2. For realtime processing, I would propose to do the plotting using strictly pyle. The data would then be fetched using the DataVaultWrapper (although we would need to ensure that refreshing the cache works).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/labrad/scalabrad-web/issues/181#issuecomment-224089937, or mute the thread https://github.com/notifications/unsubscribe/AIeWlykFsusP7c8AIpTU49bWSkNFj774ks5qJI0kgaJpZM4Il-yo .

jwenner commented 8 years ago

@btchiaro, note this scalabrad-web project is a public project which other groups are using. As such, the approach I could see for what you suggest is having an environment variable specifying external projects to use for plotting. @joshmutus, is this even possible?

btchiaro commented 8 years ago

Yea, what I'm thinking is that we have our own instance of the grapher running on a server somewhere. It would be cool if we could attach a pyle checkout to that instance. Like a plug-in.

On Mon, Jun 6, 2016 at 2:30 PM, Jim Wenner notifications@github.com wrote:

@btchiaro https://github.com/btchiaro, note this scalabrad-web project is a public project which other groups are using. As such, the approach I could see for what you suggest is having an environment variable specifying external projects to use for plotting. @joshmutus https://github.com/joshmutus, is this even possible?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/labrad/scalabrad-web/issues/181#issuecomment-224094490, or mute the thread https://github.com/notifications/unsubscribe/AIeWl3xd4j0EvAbMfVXpM0PfeEJu8Xgbks5qJJFkgaJpZM4Il-yo .

joshmutus commented 8 years ago

So @maffoo and I talked about this briefly and what you're talking about is making scalabrad web into a fully featured analysis tool, which is waaaay beyond the scope of this project. Generally speaking adding simple features introduces all sorts of interactions and bugs and makes a project super hard to maintain. Adding a complex features like this are particularly daunting. We can talk about it in detail later. Basically numpy doesn't exist for javascript and all the things you take for granted in pyle don't exist in the browser.

I don't see the problem of storing a computed trace in this case. You have all the code on your end to compute it and adding it is trivial. It's not like we're hard up for hard drive space and the live update is a useful feature. Why do we have to kill ourselves for the DRY principle here? @DanielSank @maffoo

DanielSank commented 8 years ago

Storing computed data in the datavault is certainly possible and in particular there's no way to stop people from doing that.

However, I wouldn't. I prefer to keep the thing I collected the experiment well separated from everything else in the project. In my mind, data has a very special role as completely immutable and un-erasable. I prefer to use more user-friendly tools like Google Drive, which support link sharing, editing, commenting, etc. for my analysis and general "lab notebook" style work.

tl,dr: I recommend using the data vault for storing raw data and nothing else. Use more appropriate tools for analysis. See IPython notebooks, for example.

joshmutus commented 8 years ago

But that (ipython notebook) doesn't solve the use case of live-view

DanielSank commented 8 years ago

But that doesn't solve the use case of live-view

Neither does storing processed data.

We can use derived traces to make live-view a little nicer, but I think processed data is a totally separate issue (already commented on it in my previous post).

joshmutus commented 8 years ago

I don't understand how storing processed data doesn't solve the live-view use case. You create a separate processed dependent and it live updates as you take data?

jwenner commented 8 years ago

I would say there are multiple separate live-view cases: a. Where the data-processing is, e.g., point-by-point. Here, it should be possible to save processed data alongside raw data. b. Where the data-processing depends on an entire row/column of data (for instance, a 2D swap spectroscopy, with data-processing being T1 vs bias/freq.) c. Where the data-processing depends on the entire dataset (e.g., the noise spectrum)

In (b) and (c), it's probably best to use pyle for plotting so long as we can refresh the cache (martinisgroup/pyle#1285). I guess the question here is what to do about (a). Am I right @btchiaro @joshmutus?

joshmutus commented 8 years ago

Yes, thank you for clarifying @jwenner I'm only referring to case (a). I see (b) & (c) outside the scope of the grapher and agree with @DanielSank about the iPython notebook, etc.

btchiaro commented 8 years ago

I'm personally OK with storing processed data along with the raw data. That's what I had been doing and it worked fine for my purposes. The issue here is that some code reviewers don't want processed data columns to be present in scans that are merged into master. If we cannot merge that code into master than it is really difficult to maintain scans with processed data columns. I think we need to reach a decision as a group whether processed traces can be present in master branch scans. If Its fine with the group, this is a fine solution for me. At least for the time being.

On Mon, Jun 6, 2016 at 3:38 PM, Josh Mutus notifications@github.com wrote:

Yes, thank you for putting clarifying @jwenner https://github.com/jwenner I'm only referring to case (a). I see (b) & (c) outside the scope of the grapher.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/labrad/scalabrad-web/issues/181#issuecomment-224110329, or mute the thread https://github.com/notifications/unsubscribe/AIeWlzsyeKxVOADxWoKXxxTE6IOMsPuwks5qJKFIgaJpZM4Il-yo .

joshmutus commented 8 years ago

@btchiaro does your processing update in the live view or does the whole dataset have to be taken first?

btchiaro commented 8 years ago

I had been storing the computed trace point by point and it was present in liveview.

On Mon, Jun 6, 2016 at 3:43 PM, Josh Mutus notifications@github.com wrote:

@btchiaro https://github.com/btchiaro does your processing update in the live view or does the whole dataset have to be taken first?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/labrad/scalabrad-web/issues/181#issuecomment-224111416, or mute the thread https://github.com/notifications/unsubscribe/AIeWlxuAav8nr96ZyHdCqmeKtq8cY8dBks5qJKKNgaJpZM4Il-yo .

maffoo commented 8 years ago

I would be ok with having a computed saved column in a dataset. This kind of reminds me of the idea of denormalizing data in a database (basically, storing redundant or derived data to improve read performance, or in our case, to avoid having to recompute it all the time). One simple example would be storing both P_0 and P_1. Clearly P_1 can be computed as 1 - P_0, but it's just easier to store both. (For that matter, the fact that we store probabilities instead of raw counts is also worth noting here.)

That said, I think this should be used judiciously, for things where there is one "obvious" way to compute the derived data column, because once it is stored it is immutable. If you're trying to compute and store something and then later decide that the computation needs to be modified slightly, then we're going to have a problem because all that old data is fixed. I don't know the specific case that @btchiaro is referring to, so I don't what to think about that. @btchiaro, can you give some specifics?

joshmutus commented 8 years ago

Disclaimer: I'm totally unfamiliar with @btchiaro's code.

We store "processed" data all the time in say, T1 where we convert from IQ point to one state probability. What the difference here?

joshmutus commented 8 years ago

Can the computed column be tied to a commit so we know what code is used to compute it?

btchiaro commented 8 years ago

@maffoo the use case that brought this all up is the rapid Ramsey scan that I wrote. This is just a Ramsey scan that is where the stats are sampled without qubit reset at a user defined sampling interval. The issue there was that I was storing data from each tomography phase and and also storing the Ramsey fringe envelope computed from those phases. During code review it was suggested that this is redundant information that should be computed by the grapher rather than stored as it's own trace.

The other use case that I think would be useful is to save the raw binary data stream that is generated by the rapid rto, but also save the noise spectrum that is revealed through Fourier analysis. The in this case the raw data is not human readable at all and requires fairly significant processing to get it into a useful form. That said there is a pretty clear "right" way to plot this data, and I think that it would be useful to store that processed trace for easy access through the grapher.

@joshmutus, the T1 example ocurred to me too and I considered it for a while. I had thought that the logical implication of the no processed data view point was that we should only ever save raw IQ data, and so even the idea of averaging over stats should be forbidden. Thinking about it more though, I think that the issue is redundancy.. Recording p1 data is not redundant unless you're storing the raw IQ also. You kind of get to set the "initial resolution" of your raw data, with out breaking the redundancy policy.

All that said. I think that storing processed data alongside the raw data can be useful both from a convenience standpoint, and also from a preservation stand point. It often happens that the format of a scan and / or its associated processing code is changed. When this happens it can be difficult to go back in time and plot an old data set if all you have is the raw ( potentially human un-readable) trace. Just having a processed trace right alongside the raw data can save a lot of time when stuff like this happens and you want to quickly compare with an old dataset.