Open b-bondurant opened 2 months ago
Commit https://github.com/m-labs/artiq/commit/25346780bfffe7d6155e58ae2b01403f93eedcf1 looks like a false positive; a change to the compiler shouldn't have any effect on the Rust runtime.
Yeah, I noticed the lack of relevant changes in that commit and thought it odd as well. If anything, I would have expected the previous commit to be the culprit, but I've done two builds of that gateware (thanks to my rm -rf
ing between tests) deployed to two different Kaslis and both worked. Haven't established the breaking commit with the same rigor though.
From: David Nadlinger @.> Sent: Friday, August 30, 2024 4:04:10 PM To: m-labs/artiq @.> Cc: Brad Bondurant, Ph.D. @.>; Author @.> Subject: Re: [m-labs/artiq] Release-7: I2C comms failure with Si5324 on Kasli v1.1 (Issue #2567)
Commit 2534678https://urldefense.com/v3/__https://github.com/m-labs/artiq/commit/25346780bfffe7d6155e58ae2b01403f93eedcf1__;!!OToaGQ!qAFSGZ1yZlYnKPag8uAGUcPAw_mpCcP2mP2VMCpYKL-aBHgfjJ1PTgj2Ya2LMNCMGImyzI27mlt20h9s2_aum1GBmZCxGw$ looks like a false positive; a change to the compiler shouldn't have any effect on the Rust runtime.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/m-labs/artiq/issues/2567*issuecomment-2322254468__;Iw!!OToaGQ!qAFSGZ1yZlYnKPag8uAGUcPAw_mpCcP2mP2VMCpYKL-aBHgfjJ1PTgj2Ya2LMNCMGImyzI27mlt20h9s2_aum1Etnjmniw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKM2KYDSWNEMAOOAQPUKEBTZUDF3VAVCNFSM6AAAAABNNAJ3B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSGI2TINBWHA__;!!OToaGQ!qAFSGZ1yZlYnKPag8uAGUcPAw_mpCcP2mP2VMCpYKL-aBHgfjJ1PTgj2Ya2LMNCMGImyzI27mlt20h9s2_aum1Ffvf46Jg$. You are receiving this because you authored the thread.Message ID: @.***>
Is the gateware bitstream/firmware build even different at all? I guess with gateware there is always the chance of two non-deterministic optimisation runs resulting in subtly different outcomes…
Had to leave early to go out of town for the weekend but I'll compare once I get back. I guess I could ramp up nix's sandboxing (--pure
and --restrict-eval
off the top of my head) as well.
From: David Nadlinger @.> Sent: Friday, August 30, 2024 5:15:32 PM To: m-labs/artiq @.> Cc: Brad Bondurant, Ph.D. @.>; Author @.> Subject: Re: [m-labs/artiq] Release-7: I2C comms failure with Si5324 on Kasli v1.1 (Issue #2567)
Is the gateware bitstream/firmware build even different at all? I guess with gateware there is always the chance of two non-deterministic optimisation runs resulting in subtly different outcomes…
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/m-labs/artiq/issues/2567*issuecomment-2322347736__;Iw!!OToaGQ!tUCSSnDnvBcdpKTDY3rJ2hyA8koyTjRAQx54Ud7gMpQJzakRpltTOJAp09VDEqaHij0yZHHV9-qQLbyfcaa349FOD67vHg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKM2KYBXCI5EBDINLDOFDHDZUDOHJAVCNFSM6AAAAABNNAJ3B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSGM2DONZTGY__;!!OToaGQ!tUCSSnDnvBcdpKTDY3rJ2hyA8koyTjRAQx54Ud7gMpQJzakRpltTOJAp09VDEqaHij0yZHHV9-qQLbyfcaa349H2rl_qSw$. You are receiving this because you authored the thread.Message ID: @.***>
Bitstream:
$ diff -q artiq_kasli_7.8193.c812801/tester_11/gateware/top.bit artiq_kasli_7.8194.2534678/tester_11/gateware/top.bit
Files artiq_kasli_7.8193.c812801/tester_11/gateware/top.bit and artiq_kasli_7.8194.2534678/tester_11/gateware/top.bit differ
Runtime:
$ diff -q artiq_kasli_7.8193.c812801/tester_11/software/runtime/runtime.bin artiq_kasli_7.8194.2534678/tester_11/software/runtime/runtime.bin
Files artiq_kasli_7.8193.c812801/tester_11/software/runtime/runtime.bin and artiq_kasli_7.8194.2534678/tester_11/software/runtime/runtime.bin differ
Building in a more strict environment, nix develop ... --sandbox --pure-eval --ignore-environment --keep HOME
(sandboxing should be on by default, but just in case; HOME
required to make Vivado happy):
$ diff -q artiq_kasli_7.8193.c812801_pure/tester_11/gateware/top.bit artiq_kasli_7.8194.2534678_pure/tester_11/gateware/top.bit
Files artiq_kasli_7.8193.c812801_pure/tester_11/gateware/top.bit and artiq_kasli_7.8194.2534678_pure/tester_11/gateware/top.bit differ
$ diff -q artiq_kasli_7.8193.c812801_pure/tester_11/software/runtime/runtime.bin artiq_kasli_7.8194.2534678_pure/tester_11/software/runtime/runtime.bin
Files artiq_kasli_7.8193.c812801_pure/tester_11/software/runtime/runtime.bin and artiq_kasli_7.8194.2534678_pure/tester_11/software/runtime/runtime.bin differ
No clue why :man_shrugging:
Latest release-8 works fine:
__ __ _ ____ ____
| \/ (_) ___| ___ / ___|
| |\/| | \___ \ / _ \| |
| | | | |___) | (_) | |___
|_| |_|_|____/ \___/ \____|
MiSoC Bootloader
Copyright (c) 2017-2024 M-Labs Limited
Bootloader CRC passed
Gateware ident 8.8955+0ac9e77;tester_11
Initializing SDRAM...
Read leveling scan:
Module 1:
00000001111111110000000000000000
Module 0:
00000011111111111000000000000000
Read leveling: 11+-4 11+-5 done
SDRAM initialized
Memory test passed
Booting from flash...
Starting firmware.
[ 0.000012s] INFO(runtime): ARTIQ runtime starting...
[ 0.003899s] INFO(runtime): software ident 8.8955+0ac9e77;tester_11
[ 0.010245s] INFO(runtime): gateware ident 8.8955+0ac9e77;tester_11
[ 0.016594s] INFO(runtime): log level set to INFO by default
[ 0.022312s] INFO(runtime): UART log level set to INFO by default
[ 0.028683s] WARN(runtime::rtio_clocking): rtio_clock setting not recognised. Falling back to default.
[ 0.037850s] INFO(runtime::rtio_clocking): Clocking has already been set up.
[ 0.070364s] INFO(runtime): network addresses: MAC=54-10-ec-34-dd-65 IPv4=10.236.88.210/0 IPv6-LL=fe80:
:5610:ecff:fe34:dd65/10 IPv6=no configured address
[ 0.083182s] WARN(runtime::rtio_mgt): error reading device map (key not found), device names will not b
e available in RTIO error messages
[ 0.095441s] INFO(runtime::rtio_mgt): SED spreading disabled by default
[ 0.103423s] INFO(runtime::mgmt): management interface active
[ 0.114721s] INFO(runtime::session): accepting network sessions
[ 0.119457s] INFO(runtime::session): running startup kernel
[ 0.125114s] INFO(runtime::session): no startup kernel found
[ 0.130832s] INFO(runtime::session): no connection, starting idle kernel
[ 0.145124s] INFO(runtime::session): no idle kernel found
Bug Report
One-Line Summary
Newer release-7 gateware/firmware fails to initialize Si5324 on Kasli v1.1, reportedly because of an I2C failure.
Issue Details
We have a Kasli v1.1 running hardware-based unit tests for DAX. I recently updated its gateware (no change in major version, just a newer rev) and was met with the following:
I replicated the same behavior on a second Kasli v1.1. Haven't checked with any newer hardware but I assume it isn't an issue since no one has reported this yet. Haven't checked release-8 yet either.
Searching backward through the release-7 commits, it looks like 25346780bfffe7d6155e58ae2b01403f93eedcf1 is where things break.
Previous commit, c81280174c6e6bd11ce4b6043811f7030f0f5b0c:
\@ 25346780bfffe7d6155e58ae2b01403f93eedcf1:
Full logs with each revision tested: https://pastebin.com/fyDyV8vA
Steps to Reproduce
$ nix develop 'git+https://github.com/m-labs/artiq?ref=release-7&rev=<rev-to-test>'
$ python -m artiq.gateware.targets.kasli_generic tester_11.json
(json here)$ artiq_flash --srcbuild -d artiq_kasli/tester_11/
Expected Behavior
The system initializes.
Actual (undesired) Behavior
The system doesn't initialize.
Your System (omit irrelevant parts)