m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
434 stars 201 forks source link

Artiq fails silently with long delays between Urukul writes #1367

Open gthickman opened 5 years ago

gthickman commented 5 years ago

Summary

A simple experiment that writes frequency, amplitude, and switch state information to two Urukul DDS channels doesn't execute the second write when a long delay (~100ms) is added between the writes.

Steps to reproduce

Create and run an experiment that writes to two Urukul channles. These can be channels on the same Urukul card. Add a long delay() between the two writes. Expected behavior is that both DDS states should be correctly written, or if not, an error should be thrown. Actual behavior is that the experiment executes the first write but not the second one, without throwing any errors. Code for a minimal working example is given below.

`from artiq.experiment import * import math

An experiment used to set DDS parameters

class set_AOMs(EnvExperiment):

def build(self):
    self.setattr_device("core")
    self.setattr_device("urukul0_cpld")
    self.setattr_device("urukul0_ch0")
    self.setattr_device("urukul0_ch1")

    #defines RF frequencies, power levels, and switch states
    self.setattr_argument("DDS1_freq", NumberValue(100 * MHz, unit="MHz", ndecimals=1),"DDS1")
    self.setattr_argument("DDS1_pwr", NumberValue(1, unit="dBm", scale=1, ndecimals=1),"DDS1")
    self.setattr_argument("DDS1_is_on", BooleanValue(default=True), "DDS1")
    self.setattr_argument("DDS2_freq", NumberValue(250 * MHz, unit="MHz", ndecimals=1, step=5 * MHz),"DDS2")
    self.setattr_argument("DDS2_pwr", NumberValue(10, unit="dBm", scale=1, ndecimals=1),"DDS2")
    self.setattr_argument("DDS2_is_on", BooleanValue(default=True), "DDS2")

def prepare(self):
    #converts RF power levels to amplitudes in V
    self.DDS1_ampl = math.sqrt(2*10**(self.DDS1_pwr/10-3)*50)
    self.DDS2_ampl = math.sqrt(2*10**(self.DDS2_pwr/10-3)*50)

@kernel
def run(self):
    #initializes the hardware
    self.core.reset()
    self.urukul0_cpld.init()
    self.urukul0_ch0.init()
    self.urukul0_ch1.init()

    #sets DDS parameters
    self.core.break_realtime()

    self.urukul0_ch0.set(frequency=self.DDS1_freq, amplitude=self.DDS1_ampl)
    if self.DDS1_is_on == True:
        self.urukul0_ch0.sw.on()
    else:
        self.urukul0_ch0.sw.off()

    delay(100*ms)

    self.urukul0_ch1.set(frequency=self.DDS2_freq, amplitude=self.DDS2_ampl)
    if self.DDS2_is_on == True:
        self.urukul0_ch1.sw.on()
    else:
        self.urukul0_ch1.sw.off()

    print("DDS's done!")`

When the above code is executed the message "DDS's done!" is displayed, but on checking the DDS states one finds that the state of DDS2 has not been updated. I have two versions of the Urukul card, v1.2 and v1.3, and the problem can be reproduced on both of them.

Components involved

Operating system: Ubuntu 18.04.1 LTS Artiq version: v5.0.dev+535.g636b4cae Gateware version: close to v5.0.dev+535.g636b4cae (we have a custom in-house build with some small differences from this version) Artiq hardware: 1 x Kasli v1.1, 1 x Urukul v1.2, 1 x Urukul v.1.3, 1 x SMA DIO v1.1, 1 x Zotino v1.3.

dnadlinger commented 5 years ago

Does your idle kernel/a subsequent experiment possibly just chick those events away?

The print gets executed immediately once the RTII event have been queued; you'd need to manually wait until now_mu() for it to be actually executed after the timeline curser has passed the end of the DDS writes.

gthickman commented 5 years ago

Ah, okay, then the print message makes sense.

As for your first comment: In this case there were no subsequent experiments, and I wouldn't have expected the idle kernel to take over and to start throwing things away until the experiment had finished.

dnadlinger commented 5 years ago

I wouldn't have expected the idle kernel to take over […] until the experiment had finished.

This is actually a deliberate design choice, to allow transferring execution between experiments without requiring the outputs to be static for the potentially seconds it takes to load a new kernel (cf. "seamless handover" in the docs). Unless core.reset() is called in the idle kernel (which it might by default), I don't think the events should just disappear, though.

Does the issue still occur with self.core.wait_until_mu(now_mu()) at the end?

gthickman commented 5 years ago

Ah, yes, I do recall something along those lines in the documentation. Also, adding 'self.core.wait_until_mu(now_mu())' at the end as you suggest does eliminate the problem.

Where in the file system is the idle kernel saved, so I can see if it calls 'core.reset()'? I've been looking around in ~/artiq/artiq/coredevice but haven't seen it. My install was done using Anaconda.

sbourdeauducq commented 5 years ago

The idle kernel is stored in compiled (binary) form in the core device's flash.

gthickman commented 5 years ago

Okay, so it looks like the problem is a core.reset() or something similar in the idle kernel. When running a set of experiment in a queue this won't be so much a problem, since control is handed off to the subsequent experiment rather than to the idle kernel. But for the last experiment in such a queue I would expect the system to allow execution to continue until it's finished, but this won't always be the case.

I suppose a failsafe workaround would be to always end a queue of experiments with a call to self.core.wait_until_mu(now_mu()). This would be slightly annoying to implement, but would avoid the wasted CPU time of calling it at the end of every experiment.

Does anyone have a better solution?

dnadlinger commented 5 years ago

You could put the self.core.wait_until_mu(now_mu()) in the idle kernel as well, or just have the idle kernel spin in an infinite loop without resetting the RTIO system.

dnadlinger commented 5 years ago

(Actually, the example idle kernel in the ARTIQ source tree actually does something like that, so the problem might lie somewhere else.)