m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
426 stars 198 forks source link

What happens in the gatware when there is a delay/jitter on the reference clock provided to the KC705? #2386

Closed philipkent17 closed 6 months ago

philipkent17 commented 6 months ago

Question

What happens in the gatware when there is a delay/jitter on the reference clock provided to the KC705?

Category: Gateware

Description

This isn't a bug, but more of a question about the RTIO gateware for the KC705, and how it would respond to a "jump" or jitter on the clock reference that is provided to the FPGA. We are running on one of the original KC705 crates built in-house at NIST circa 2016. We're running ARTIQ version 3.7 on the NIST_QC2 gateware.

If we run the experiment below, we get an undesirable side-effect that the RTIO counter decreases by a large amount shortly after calling init_sync() on the first AD9914 DDS in our crate (dds0). The amount by which the RTIO jumps backwards is not deterministic and is different each time. The decrease in the RTIO counter results in delaying all future RTIO events on the order several minutes. We have a work-around for this but would like to understand what's happening on the FPGA.

dds0 provides the reference clock to our FPGA via its SYNC_CLK pin and there is a short "jump"/jitter in the clock output on the SNYC_CLCK pin while dds0 is going through its DAC calibration and synchronization after calling dds0.init_sync(). The FPGA is therefore receiving a reference clock that also "jumps"/jitters for a short period of time. This doesn't cause the FPGA to crash or hang, but only seems to manifest as the RTIO counter value returned by self.core.get_rtio_counter_mu() to decrease in value and jump backwards in time. This jump in the RTIO counter only seems to happen once after calling dds0.init_sync().

Would this be expected behavior of the gateware if a rising edge for the FPGA reference clock were delayed in time? We haven't gotten a trace of the SYNC_CLK during the init_sync yet, so it may be more complicated than this. I will post an image once we have that.

Naively, I would expect that internal counter increments, register updates, program counter increment, etc on the FPGA (i.e. the next update of the state of the entire FPGA) to simply be delayed in time if the next rising edge of the reference clock is delayed in time. That would imply that the RTIO counter value shouldn't jump backwards, but simply be delayed in time for its next increment by +1. I wouldn't be surprised if it is more complicated than this, though. Would you expect there to be any other side-effects of clock delay or jitter on the FPGA reference clock that perhaps we aren't aware of or simply haven't seen yet?

from artiq.experiment import *

class FixRTIOCounterDecrease(EnvExperiment):

    def build(self):
        self.setattr_device('core')
        self.setattr_device('scheduler')
        self.setattr_device('dds0')

    @kernel
    def run(self):
        self.core.reset()
        delay(100 * ms)
        self.dds0.init_sync(0)
        # wait for the RTIO counter to jump backwards in time, then correct for this
        # by moving the cursor (i.e. now_mu()) backwards in time by the same amount.
        self.block()

    @kernel
    def block(self):
        """Block the processor until all previously programmed RTIO events have executed"""
        last_rtio = self.core.get_rtio_counter_mu()
        while True:
            if self.scheduler.check_pause():
                break
            rtio = self.core.get_rtio_counter_mu()
            rtio_delta = rtio - last_rtio
            last_rtio = rtio
            print(self.core.mu_to_seconds(rtio_delta))
            # correct for rtio counter jumping backwards in time
            if rtio_delta < 0:
                at_mu(now_mu() + rtio_delta)
                self.core.reset()  # will get an RTIOSeqenceError here after, otherwise.  Can probably just call self.core.reset()
            slack = now_mu() - self.core.get_rtio_counter_mu()
            if slack <= 0:
                break

Output:
----------------------
print:4442776
print:8197264
print:8196056
print:8195856
print:7785392
print:8197864
print:8199752
print:7800288
print:7815904
print:8230328
print:-40475336640
print:11883544
philipkent17 commented 6 months ago

It may be that the SYNC_CLK signal is actually jumping in amplitude and not that the rising edges of SYNC_CLCK is jumping in time. The other person that saw this behavior is not in today. I need to ask him about the jumps he saw on the SYNC_CLCK pin and if they were in amplitude or in time. I'll post back after talking to him. Sorry for the somewhat premature post.

sbourdeauducq commented 6 months ago

On 3/29/24 01:08, Philip Kent wrote:

Would you expect there to be any other side-effects of clock delay or jitter on the FPGA reference clock that perhaps we aren't aware of or simply haven't seen yet?

Yes. The clock needs to stay stable once the RTIO system is out of reset.

sbourdeauducq commented 6 months ago

There is a number of components on that clock (PLLs, sequential logic...) that will not tolerate short pulses and other clock instabilities. The RTIO counter values you are seeing are probably some data corruption caused by the clock glitches, and there's probably more. You should clock the DDS and the FPGA from the same oscillator and with a clock distribution circuit that does not introduce glitches.