Data formating and optimization for characterization

Hi @andrea-pasquale,

First, thank you very much for working on this project as I think it will be very useful. I have been doing some very primitive things in my branch. The latest one is qibolab/maxime_ultimate_fix. You can find all my primitive work in examples/QW_5Q/characterization.

As you know, we were using Quantify before for data handeling, live plotting and measurement control (and instrument control). Your solution will probably replace all that.

We didn't really appreciate the measurement control as it can be limiting. If you are not familiar with what it is, it is just three lines of code implementing data saving/plotting and setting/retreiving data. One example where it is limiting is when adding feedback within the loop - eg of eg when tracking a peak moving.

The live plotting is very useful because you want to interupt a measurement if it is not going accordingly. I am doing it in a silly way currently, but it allows me to have live plotting while using the slurm queue. I have not preference in how this should be done, but I think that it is very important to have it. I was using plotly to do it because I like the library, and you can then make nice html reports.

Regarding data handling, there are two things: data format and data saving. Data saving is quite straightforward, we should have TUIDs and data folders with nice functions like get_latest_tuid or get_tuids_contraining("experiment_name"). For this Quantify does a good job in my opinion.

Regarding the data formatting, it is tricky because you want to constraint your data format to have efficient data handling (eg. for user not to use numpy.savetxt) but you also want to be flexible to allow the platform to return anything, and the data saving would accomodate. The way that I did it so far is that my characterization script would take care of organizing the data sent by the plaform, hence not platform agnostic. If you want your characterization libary to be platform agnostic, you will have to constraint your platform in returning a standardised data format. It would be an array for all the bins for example. However, can you really be platform agnostic? I don't think so. For example, some instruments have functions like sweeping a frequency directly in the hardware and return the corresponding array of voltages. I see in only one way to be kinda platform agnostic. We could have in the abstract platform a list of functionalities that platforms could have like binning, frequency sweep, feedback...., and the characterization script would check if the platform has them if their runcards, and it would choose the fastest option and adjust the way it sets/gets data accordingly. This means that the data formatting can be included in the abstract platform directly. This will be the only bridge between Qibolab and Qcvv. All the post-processing will higly rely on that (fitting, plotting, etc).

Regarding this formatting, we are now constraning the platform to return data. We could force them to use a certain type of data like Xarrays, panda frames etc... I have no preference but I beleive that it would be better for live plotting if they are forced to have units and labels. In Qcvv, we can then wrap around the data object with functions like plot/ peak search/ etc... Ideally, the experimentalist would import qcvv, load the latest run, and analyse it with qcvv tools.

I had tried creating a format for the data (in characterization/ abstract.py), but it was a failure. I like xarrays for the *html vizualization in jupyter notebook but I don't like their coords/ dims confusing system.

Regarding the architecture of Qccv, I have made a primitive one (clearly not to copy) where I was calling this class that had children classes when initialized were running the script and putting as attributes relevantes parameters (this last thing that I found useful). I would then know what would my characterization script return before even running it. I could then print certain values along the way and adjust the following scripts.

I am not the best writer so I might not have conveyed my message properly. Feel free to ask me anything! It would be fantastic if we could work closely together. I am looking forward to hearing your feedback.

Maxime

Hi @maxhant, thank you very much for opening detailed issue.

First, thank you very much for working on this project as I think it will be very useful. I have been doing some very primitive things in my branch. The latest one is qibolab/maxime_ultimate_fix. You can find all my primitive work in examples/QW_5Q/characterization.

Perfect, so I'll start by checking out this branch.

As you know, we were using Quantify before for data handeling, live plotting and measurement control (and instrument control). Your solution will probably replace all that.

You are right. I'm already happy to see that in your branch you are no longer using it.

We didn't really appreciate the measurement control as it can be limiting. If you are not familiar with what it is, it is just three lines of code implementing data saving/plotting and setting/retreiving data. One example where it is limiting is when adding feedback within the loop - eg of eg when tracking a peak moving.

The live plotting is very useful because you want to interupt a measurement if it is not going accordingly. I am doing it in a silly way currently, but it allows me to have live plotting while using the slurm queue. I have not preference in how this should be done, but I think that it is very important to have it. I was using plotly to do it because I like the library, and you can then make nice html reports.

Regarding live plotting I understand that it is extremely useful, especially when you start calibrating your setup for the first time. If you are using plotly I think that it is a good starting point. I don't really know if live plotting is included in the scope of this project. The initial idea was to have this script that will run all the calibration routines and once it is done you would be able to check locally or later in a web page report how the calibration went. If the main reason for live plotting is to interrupt early a bad calibration we can also think about including some sort of early stopping techniques in order to automatically stop the calibration if something is going bad. For example if during the sweep you see some frequency values that don't make sense we can stop that particular calibration routine. Let me know what you think about this.

Regarding data handling, there are two things: data format and data saving. Data saving is quite straightforward, we should have TUIDs and data folders with nice functions like get_latest_tuid or get_tuids_contraining("experiment_name"). For this Quantify does a good job in my opinion.

Regarding the data formatting, it is tricky because you want to constraint your data format to have efficient data handling (eg. for user not to use numpy.savetxt) but you also want to be flexible to allow urlthe platform to return anything, and the data saving would accomodate. The way that I did it so far is that my characterization script would take care of organizing the data sent by the plaform, hence not platform agnostic. If you want your characterization libary to be platform agnostic, you will have to constraint your platform in returning a standardised data format. It would be an array for all the bins for example. However, can you really be platform agnostic? I don't think so. For example, some instruments have functions like sweeping a frequency directly in the hardware and return the corresponding array of voltages. I see in only one way to be kinda platform agnostic. We could have in the abstract platform a list of functionalities that platforms could have like binning, frequency sweep, feedback...., and the characterization script would check if the platform has them if their runcards, and it would choose the fastest option and adjust the way it sets/gets data accordingly. This means that the data formatting can be included in the abstract platform directly. This will be the only bridge between Qibolab and Qcvv. All the post-processing will higly rely on that (fitting, plotting, etc).

Regarding this formatting, we are now constraning the platform to return data. We could force them to use a certain type of data like Xarrays, panda frames etc... I have no preference but I beleive that it would be better for live plotting if they are forced to have units and labels. In Qcvv, we can then wrap around the data object with functions like plot/ peak search/ etc... Ideally, the experimentalist would import qcvv, load the latest run, and analyse it with qcvv tools.

I had tried creating a format for the data (in characterization/ abstract.py), but it was a failure. I like xarrays for the *html vizualization in jupyter notebook but I don't like their coords/ dims confusing system.

I see that data handling could be quite tricky. Maybe it is better if we discuss about this in person. I think that another way in which we could aim at being as much as possible platform agnostic would be to find a way to perform all the calibration routines using just pulses? I don't know if this is possible, but given that in Qibolab we already have a Pulse API which is common for different platforms this could be a good starting point. We can even think about doing everything directly with quantum circuits. I found this tutorial from qiskit where if I understood correctly they are performing a qubit spectroscopy just by using pulses and gates. There is also this library from the people at Righetti where they already wrote a bunch of calibration routines, if you want to have another example.

These are my initial thoughts. I think that it would be great if we can have a meeting between us. I can show you a little bit better what I'm planning to do and you can for example show me the branch that you have been working on. Let me know what you think.

qiboteam / qibocal

Data formating and optimization for characterization #5