m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
437 stars 201 forks source link

Phaser can enter "lock up" state that requires restart from interrupted experiments #1597

Open cjbe opened 3 years ago

cjbe commented 3 years ago

Bug Report

One-Line Summary

Phaser can enter "lock up" state that requires a power-cycle due to an interrupted experiment

Steps to Reproduce

  1. Run an experiment that uses Phaser
  2. Force kill / RTIO underflow the experiment

Sometimes (every few minutes) the subsequent experiment using Phaser fails with the error "cannot read board ID" in the init.

After reloading the FPGA (artiq_flash start) the error changes to "DUC+Oscillator phase/amplitude test failed" This continues until the system is power cycled (at the moment I am doing this by pulling the power for the whole rack).

Expected Behavior

The phaser init() always completes successfully.

Your System (omit irrelevant parts)

Using Artiq master & Phaser master gateware / firmware.

pathfinder49 commented 3 years ago

Bump. RTIO underflows happen regularly in our workflow and we can't afford to power cycle kasli frequently. We often need to operate remotely. Further, a power cycle results in bad thermal transients in our RF chain.

hartytp commented 3 years ago

ping @jordens what's the plan for resolving this issue?

jordens commented 3 years ago

No specific plan (as for many issues that people have posted but where long term funding and momentum for continued debugging, development, and maintenance does not exist). We're happy to offer paid support. In any case, more context and an an effort to provide an MWE would be good.

jordens commented 3 years ago

Just to confirm, this is definitely phaser master from January?

cjbe commented 3 years ago

Just to confirm, this is definitely phaser master from January?

Yes, commit https://github.com/quartiq/phaser/commit/b36e506b08382969e785597de0cc0e6c222b0445