m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
437 stars 201 forks source link

serwb intermittently fails to initialize #856

Closed sbourdeauducq closed 6 years ago

sbourdeauducq commented 7 years ago

This occurs on the Sayma1 board we have on the HK server.

sbourdeauducq commented 6 years ago

Sometimes the initialization fails in a loop and this is resolved by reloading the RTM FPGA:

[     7.715943s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     7.890916s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.065888s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.240862s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.415835s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.590808s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.765780s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.940753s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     9.115726s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     9.290699s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     9.465672s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.

...RTM FPGA reloaded by JTAG...

[     9.616264s]  INFO(board::serwb): done.
[     9.618792s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[    10.618006s]  INFO(runtime): continuing boot
[    10.620961s]  INFO(board::hmc830_7043::hmc830): HMC830 found
[    10.626594s]  INFO(board::hmc830_7043::hmc830): HMC830 configuration...
[    10.633283s]  INFO(board::hmc830_7043::hmc830): waiting for lock...
enjoy-digital commented 6 years ago

Thanks. I look at that today.

sbourdeauducq commented 6 years ago

The following change to MiSoC breaks serwb 100%:

diff --git a/misoc/targets/sayma_amc.py b/misoc/targets/sayma_amc.py
index 5737da95..aeb37cab 100755
--- a/misoc/targets/sayma_amc.py
+++ b/misoc/targets/sayma_amc.py
@@ -123,7 +123,7 @@ class MiniSoC(BaseSoC):
             self.config["RGMII_CLOCK_REROUTED"] = None
             si5324_clkin = self.platform.request("si5324_clkin")
             si5324_clkout = self.platform.request("si5324_clkout_fabric")
-            self.specials += DifferentialOutput(eth_clocks.rx, si5324_clkin.p, si5324_clkin.n)
+            self.specials += DifferentialOutput(ClockSignal(), si5324_clkin.p, si5324_clkin.n)
             eth_clocks.rx = Signal()
             self.specials += DifferentialInput(si5324_clkout.p, si5324_clkout.n, eth_clocks.rx)
         self.submodules.ethphy = LiteEthPHY(eth_clocks,

migen 775572ea7, misoc f509de0cb, artiq 2b01aa22b

enjoy-digital commented 6 years ago

Hmm ok, at least this can help me understand what is going on.

sbourdeauducq commented 6 years ago

Also the RTM design doesn't meet timing...

enjoy-digital commented 6 years ago

I'm looking at that.

enjoy-digital commented 6 years ago

@sbourdeauducq: i should have fixed timing on RTM design (a false path was missing). It seems I'm not able to reproduce easily the issue.

I tried adding self.specials += DifferentialOutput(ClockSignal(), si5324_clkin.p, si5324_clkin.n) to my design but serwb is still working. Can you always enable debug on serwb while we still have the issue? It could help me understand what is going on.

For the case where RTM is reloaded by JTAG, how was it loaded initially? from flash? I'm just trying to understand because RTM should automatically be reseted by AMC when retrying. Are you sure RTM was correctly loaded?

sbourdeauducq commented 6 years ago

Try on the HKG boards with SSH? RTM is always loaded with JTAG, there is currently no other way.

jbqubit commented 6 years ago

What's the status of this?

sbourdeauducq commented 6 years ago

Try it on your board. Could be another hw problem. Tom and Florent are not experiencing it.

jbqubit commented 6 years ago

Built .bit this afternoon from master.

__Sayma_AMC TS190717-7__

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 42 69 56 78 73 88 54 62 done
Read delays: 7:00-160 6:08-181 5:57-229 4:70-242 3:101-260 2:109-273 1:127-286 0:134-294 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003857s]  INFO(runtime): software version 4.0.dev+404.gac3c3871
[     0.010118s]  INFO(runtime): gateware version 4.0.dev+404.gac3c3871
[     0.016370s]  INFO(runtime): log level set to INFO by default
[     0.022096s]  INFO(runtime): UART log level set to INFO by default
[     0.028257s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.711094s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     1.391311s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     2.071214s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     2.770865s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     3.452872s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     4.138800s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     4.823377s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     5.504940s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.

__Sayma_AMC TS190717-2__ This AMC has different behavior. It hangs...

$ flterm /dev/ttyUSB1

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 56 78 61 86 73 97 58 68 done
Read delays: 7:02-175 6:19-195 5:67-241 4:81-252 3:103-279 2:108-287 1:139-312 0:147-320 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003857s]  INFO(runtime): software version 4.0.dev+404.gac3c3871
[     0.010118s]  INFO(runtime): gateware version 4.0.dev+404.gac3c3871
[     0.016384s]  INFO(runtime): log level set to INFO by default
[     0.022103s]  INFO(runtime): UART log level set to INFO by default
[     0.028257s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

Hangs here.

hartytp commented 6 years ago

@jbqubit hmm...interesting.

jbqubit commented 6 years ago

For the logs just reported I built .bit for RTM and amc stand alone. And flashed using ~/github/m-labs/sinara$ artiq_flash --srcbuild ./misoc_standalone_sayma_amc -t sayma.

are all power supply lights on both boards on?

Yes.

Can you try with the current sayma amc standalone and sayma RTM condo packages, please? OK. I installed artiq-sayma_amc-standalone and artiq-sayma_rtm from conda and flashed after fixing some typos in artiq_flash (cf #890).

$ flterm /dev/ttyUSB1

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 54 77 58 85 72 96 57 64 done
Read delays: 7:04-170 6:19-196 5:62-240 4:80-249 3:102-278 2:107-288 1:134-311 0:146-312 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003859s]  INFO(runtime): software version 4.0.dev+400.g6d58c439
[     0.010120s]  INFO(runtime): gateware version 4.0.dev+400.g6d58c439
[     0.016385s]  INFO(runtime): log level set to INFO by default
[     0.022105s]  INFO(runtime): UART log level set to INFO by default
[     0.028259s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 54 77 59 85 72 96 56 65 done
Read delays: 7:05-176 6:17-192 5:65-242 4:78-251 3:102-273 2:106-284 1:132-140 0:145-314 done
SDRAM initialized
Memory test failed (43751/1114624 words incorrect)
Halting.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 55 77 60 86 72 96 57 63 done
Read delays: 7:02-171 6:16-194 5:66-241 4:79-249 3:102-277 2:107-283 1:135-311 0:149-318 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003859s]  INFO(runtime): software version 4.0.dev+400.g6d58c439
[     0.010120s]  INFO(runtime): gateware version 4.0.dev+400.g6d58c439
[     0.016385s]  INFO(runtime): log level set to INFO by default
[     0.022105s]  INFO(runtime): UART log level set to INFO by default
[     0.028259s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
sbourdeauducq commented 6 years ago

You need to load the rtm manually.

sbourdeauducq commented 6 years ago

https://gist.github.com/sbourdeauducq/f323a10f0a8306bd531769f89d25f5ea

jbqubit commented 6 years ago

Ok. Here's what I'm now doing.

python -m artiq.gateware.targets.sayma_rtm 
python -m artiq.gateware.targets.sayma_amc_standalone 
find . -name "*.bit 
find /home/britton/anaconda3 -name "xilinx-xcu.cfg" 

openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 0 ./artiq_sayma_rtm/top.bit; exit" 

openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 1 ./misoc_standalone_sayma_amc/gateware/top.bit; exit" 
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 0 ./artiq_sayma_rtm/top.bit; exit"  
Open On-Chip Debugger 0.10.0 (2017-02-03-06:53)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Error: JTAG scan chain interrogation failed: all ones
Error: Check JTAG interface, timings, target power, etc.
Error: Trying to use configured scan chain anyway...
Error: xc7.tap: IR capture error; saw 0x3f not 0x01
Warn : Bypassing JTAG setup events due to errors
Warn : gdb services need one or more targets defined
loaded file ./artiq_sayma_rtm/top.bit to pld device 0 in 3s 588074us
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ 
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ 
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 1 ./misoc_standalone_sayma_amc/gateware/top.bit; exit"  
Open On-Chip Debugger 0.10.0 (2017-02-03-06:53)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Error: JTAG scan chain interrogation failed: all ones
Error: Check JTAG interface, timings, target power, etc.
Error: Trying to use configured scan chain anyway...
Error: xc7.tap: IR capture error; saw 0x3f not 0x01
Warn : Bypassing JTAG setup events due to errors
Warn : gdb services need one or more targets defined
loaded file ./misoc_standalone_sayma_amc/gateware/top.bit to pld device 1 in 12s 874773us
sbourdeauducq commented 6 years ago

You have the 1.8V and/or jtag bugs. Power cycle boards, replug USB connectors, until there are no errors.

jbqubit commented 6 years ago

The 1.8V is fine. I'm watching it on a scope.

I've applied the JTAG white-wire fix https://github.com/m-labs/sinara/issues/463

artiq_flash --srcbuild ./misoc_standalone_sayma_amc -t sayma was working just fine on this board a couple days ago. But looks like @whitequark made some changes (https://github.com/m-labs/artiq/commit/f77aa9b78ff7d1a61417650cfd65dbc4635cca0f#diff-23ef9b8c7f366ae0fd8efc4411bdc7a8) to artiq_flash. And now artiq_flash doesn't work for want of bscan_spi_xcku040-sayma.bit. @whitequark should I expect to be able to use artiq_flash now for Sayma?

whitequark commented 6 years ago

@whitequark should I expect to be able to use artiq_flash now for Sayma?

Sure, I use it for Sayma.

And now artiq_flash doesn't work for want of bscan_spi_xcku040-sayma.bit.

What exactly is the error message here?

jbqubit commented 6 years ago
$ artiq_flash --srcbuild ./misoc_standalone_sayma_amc -t sayma_rtm
proxy gateware bitstream bscan_spi_xcku040-sayma.bit not found
$ find /home/britton/anaconda3/envs/artiq-dev2 -name "*xcku040-sayma.bit"
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq
$ 
jordens commented 6 years ago

It's an old openocd. @whitequark

sbourdeauducq commented 6 years ago

Considering:

...it is possible that this is simply another consequence of the 1.8V bug.

jordens commented 6 years ago

Yes. The "Open On-Chip Debugger 0.10.0 (2017-02-03-06:53)" Joe installed won't help even with those issues resolved.

jbqubit commented 6 years ago

I've been successfully flashing this board for several weeks now using artiq_flash. See here. Pending https://github.com/m-labs/artiq/issues/898 I'll try again.

enjoy-digital commented 6 years ago

By reducing serwb linerate from 1.25Gbps to 625Mbps, it seems to be reliable on at least a board that has the 1.8v issue. (it's difficult to say if it's related or not). Let's use 625Mbps for now. Note on sayma1 (that had 1.8v issue), when restarting AMC with artiq_flash, RTM is no longer alive and need to be reloaded by JTAG (this is not the case with the board i bring with me). This is maybe another issue.

gkasprow commented 6 years ago

@enjoy-digital once you restart AMC you may toggle config pins so RTM gets unconfigured...

jbqubit commented 6 years ago

@enjoy-digital IT's indeed odd that 1.25 Gbps worked some time ago but now doesn't. Is serwb now working reliably at 635 Mbps?

enjoy-digital commented 6 years ago

@jbqubit: i don't think serwb 1.25Gbps has a different behaviour than before. Just that it seems not reliable with some of the boards at 1.25Gbps and seems to be reliable with all boards we tested at 625Mbps.

Note that there are 2 problems here:

jbqubit commented 6 years ago

Ok So problem 1 relates to #813. Agreed that 625 Mbps is fine for getting started. :)

sbourdeauducq commented 6 years ago

@enjoy-digital Sometimes when the RTM FPGA is not loaded, it prints:

[     1.454088s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.

and sometimes it just waits for it to be loaded. Why is that?

jordens commented 6 years ago

Serwb also appears to hang randomly on sayma-3, mostly when waiting for HMC830 lock. This happens with current master as well as spi2. And @hartytp also sees this (with a slightly older master).

[     0.000005s]  INFO(runtime): ARTIQ runtime starting...
[     0.003865s]  INFO(runtime): software version 4.0.dev+624.gb466a569
[     0.010130s]  INFO(runtime): gateware version 4.0.dev+630.g54b51493
[     0.016396s]  INFO(runtime): log level set to INFO by default
[     0.022113s]  INFO(runtime): UART log level set to INFO by default
[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.746275s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     1.419144s]  INFO(board_artiq::serwb): done.
[     1.422262s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+624.gb466a569
[     1.429735s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     2.429006s]  INFO(runtime): continuing boot
[     2.431966s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 found
[     2.438116s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 configuration...
[     2.445347s]  INFO(board_artiq::hmc830_7043::hmc830): waiting for lock...
jbqubit commented 6 years ago

My board as well. No flashing errors.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 31 53 45 65 36 52 24 35 done
Read delays: 7:04-98 6:16-115 5:67-157 4:74-169 3:99-196 2:104-192 1:123-217 0:135-226 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000005s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+618.g820c8342
[     0.010129s]  INFO(runtime): gateware version 4.0.dev+618.g820c8342
[     0.016396s]  INFO(runtime): log level set to INFO by default
[     0.022115s]  INFO(runtime): UART log level set to INFO by default
[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.736673s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     1.520363s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     2.324061s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     3.338301s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
$ artiq_flash --target sayma --variant standalone --dir /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio
Design: top;COMPRESS=TRUE;UserID=0XFFFFFFFF;Version=2017.4.1
Part name: xcku040-ffva1156-1-c
Date: 2018/02/22
Time: 18:10:23
Bitstream payload length: 0xc46a7c
Open On-Chip Debugger 0.10.0-00013-gbb7beda (2018-02-13-15:56)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Info : JTAG tap: xc7.tap tap/device found: 0x0362e093 (mfg: 0x049 (Xilinx), part: 0x362e, ver: 0x0)
Info : JTAG tap: xcu.tap tap/device found: 0x13822093 (mfg: 0x049 (Xilinx), part: 0x3822, ver: 0x1)
Info : gdb server disabled
RTM FPGA XADC:
TEMP 42.50 C
VCCINT 1.002 V
VCCAUX 1.796 V
VCCBRAM 1.002 V
VPVN 0.000 V
VREFP 0.000 V
VREFN 0.000 V
VCCPINT 0.000 V
VCCPAUX 0.000 V
VCCODDR 0.000 V
AMC FPGA XADC:
TEMP 40.70 C
VCCINT 0.890 V
VCCAUX 1.785 V
VCCBRAM 0.959 V
VPVN 0.000 V
VREFP 0.000 V
VREFN 0.000 V
VCCPINT 0.000 V
VCCPAUX 0.000 V
VCCODDR 0.000 V
loaded file /home/britton/anaconda3/envs/artiq-dev/share/bscan-spi-bitstreams/bscan_spi_xcku040-sayma.bit to pld device 1 in 3s 849801us
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
flash 'jtagspi' found at 0x00000000
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
Info : sector 0 took 715 ms
Info : sector 1 took 729 ms
...
Info : sector 194 took 713 ms
Info : sector 195 took 708 ms
Info : sector 196 took 725 ms
erased sectors 0 through 196 on flash bank 0 in 141.517853s
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
wrote 12872316 bytes from file /tmp/tmpul9tr71q to flash bank 0 at offset 0x00000000 in 95.553131s (131.556 KiB/s)
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
read 12872316 bytes from file /tmp/tmpul9tr71q and flash bank 0 at offset 0x00000000 in 21.167046s (593.877 KiB/s)
contents match
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
flash 'jtagspi' found at 0x00000000
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
Info : sector 0 took 733 ms
Info : sector 1 took 744 ms
erased sectors 0 through 1 on flash bank 1 in 1.477441s
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
wrote 74980 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/bootloader/bootloader.bin to flash bank 1 at offset 0x00000000 in 0.562994s (130.059 KiB/s)
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
read 74980 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/bootloader/bootloader.bin and flash bank 1 at offset 0x00000000 in 0.124591s (587.704 KiB/s)
contents match
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
flash 'jtagspi' found at 0x00000000
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
Info : sector 5 took 734 ms
Info : sector 6 took 738 ms
Info : sector 7 took 742 ms
Info : sector 8 took 741 ms
Info : sector 9 took 751 ms
Info : sector 10 took 740 ms
Info : sector 11 took 739 ms
Info : sector 12 took 742 ms
Info : sector 13 took 743 ms
erased sectors 5 through 13 on flash bank 1 in 6.669954s
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
wrote 588232 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/runtime/runtime.fbi to flash bank 1 at offset 0x00050000 in 4.515156s (127.226 KiB/s)
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
read 588232 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/runtime/runtime.fbi and flash bank 1 at offset 0x00050000 in 0.975071s (589.132 KiB/s)
contents match
jordens commented 6 years ago

Did you even load the RTM gateware?

jbqubit commented 6 years ago

Yes, I'm flashing the RTM. I neglected to paste it -- now included below. Booting fails with Memory test failed or serwb bridge initialization failed. Once it advances to the point where the failure was "HMC830 lock timeout".

$ openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_flash.cfg -c "pld load 0 /home/britton/artiq-dev/artiq/artiq_sayma/rtm_gateware/rtm.bit; exit"  
Open On-Chip Debugger 0.10.0-00013-gbb7beda (2018-02-13-15:56)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Info : JTAG tap: xc7.tap tap/device found: 0x0362e093 (mfg: 0x049 (Xilinx), part: 0x362e, ver: 0x0)
Info : JTAG tap: xcu.tap tap/device found: 0x13822093 (mfg: 0x049 (Xilinx), part: 0x3822, ver: 0x1)
Warn : gdb services need one or more targets defined
loaded file /home/britton/artiq-dev/artiq/artiq_sayma/rtm_gateware/rtm.bit to pld device 0 in 1s 52267us
$ cat sayma_flash.cfg

interface ftdi
#ftdi_device_desc "Quad RS232-HS"
ftdi_vid_pid 0x0403 0x6011
# if there are multiple Sayma:
#ftdi_location 5:2
ftdi_channel 0
# EN_USB_JTAG on ADBUS7: out, high
# nTRST on ADBUS4: out, high, but R46 is DNP

ftdi_layout_init 0x0098 0x008b
reset_config none
adapter_khz 5000
transport select jtag
source [find cpld/xilinx-xc7.cfg]
set CHIP XCKU040
source [find cpld/xilinx-xcu.cfg]
init
hartytp commented 6 years ago

Did you even load the RTM gateware?

@jordens Starting the informal etiquette manual: AFAICT, the word "even" here serves no purpose other than to make a helpful comment come across as somewhat rude/condescending.

@jboulder AFAICT you need to be a little careful over the timing of loading (not flashing if we want to be pedantic -- if we could flash it, life would be much easier) the RTM FPGA. At some point during startup the AMC restarts the RTM FPGA, and loading must be done after that point. If find that if you get the timing right this all works reliably.

sbourdeauducq commented 6 years ago

At some point during startup the AMC restarts the RTM FPGA

It doesn't. I don't know why it looks like it does, there is probably another bug somewhere.

hartytp commented 6 years ago

hmmmm...well it certainly behaves a lot as if it does.

jordens commented 6 years ago

@hartytp Reading undue rudeness into that question is thin-skinned IMHO. Especially in the light of past experience with negligent and careless treatment of advice and instructions and the frustration associated with it. You yourself are confirming that by suggesting that Joe personally may not have been careful. The "rudeness" seems to be already healed by the purely technical rephrasing "Is the RTM gateware even loaded?". Would you consider that condescending? But yes. Until the RTM is loaded automatically people need to be extra-careful when testing this.

hartytp commented 6 years ago

@hartytp Reading undue rudeness into that question is thin-skinned IMHO. Especially in the light of past experience with negligent and careless treatment of advice and instructions and the frustration associated with it. You yourself are confirming that by suggesting that Joe personally may not have been careful. The "rudeness" seems to be already healed by the purely technical rephrasing "Is the RTM gateware even loaded?". Would you consider that condescending?

I don't understand your argument here. You seem to be acknowledging that your comment was phrased in a way that was deliberately rude, but arguing that this is appropriate given the history. Or, is your point that you could easily have been ruder, so we should be thankful for the relatively restrained level of rudeness you chose to adopt for your post?

In either case, the work "even" here adds nothing to your point on a technical level, but to any native English speaker it implies an element of rudeness. Given the current tensions, it shouldn't be a surprise if that rubs people the wrong way.

enjoy-digital commented 6 years ago

@sbourdeauducq, @hartytp: serwb is reseting the RTM FPGA at startup: https://github.com/m-labs/artiq/blob/master/artiq/gateware/targets/sayma_rtm.py#L158

hartytp commented 6 years ago

Thanks for confirming that (I knew it does, because I've experience it many times when working with Sayma).

sbourdeauducq commented 6 years ago

serwb is reseting the RTM FPGA at startup:

Yes, I know, but that's not touching the bitstream.

jordens commented 6 years ago

@hartytp My points are, that it wasn't meant rude, that I don't consider it rude if I am asked that question by you, that "even" is not an insult (just search through your own usage of it on artiq or sinara), that I wouldn't recommend considering it rude for the etiquette rules you are writing, and that I claim on the basis of the general level of rudeness by Joe that even if you as a third person would consider it rude, Joe is not in a position to complain about it.

jbqubit commented 6 years ago

The reason for my post that prompted your rude comment is that booting didn't hang at the usual point when waiting for RTM FPGA. What I expected:

[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

What I saw:

[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.736673s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
...

This precluded interaction with the RTM FPGA which is why I didn't post about it.

Today I see similar behavior but after a longer delay.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 26 52 39 60 45 60 29 38 done
Read delays: 7:00-129 6:01-155 5:47-186 4:55-196 3:97-226 2:104-239 1:116-259 0:129-267 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+636.gf97163cd
[     0.010129s]  INFO(runtime): gateware version 4.0.dev+636.gf97163cd
[     0.016395s]  INFO(runtime): log level set to INFO by default
[     0.022114s]  INFO(runtime): UART log level set to INFO by default
[     0.028264s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[   403.212222s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   403.980793s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   404.777906s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   405.481089s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   406.229492s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   406.972165s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   407.699517s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   408.394776s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   409.114399s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   409.830605s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   410.587286s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   411.789428s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   412.683285s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   413.375680s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   414.073249s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   414.898720s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   415.631904s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   416.400393s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   417.255687s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   418.054276s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   418.800799s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
panic at /home/britton/artiq-dev/artiq/artiq/firmware/runtime/main.rs:268: exception 7 at PC 0x408cc3c8, EA 0x40143cf0
backtrace for software version 4.0.dev+636.gf97163cd:

@jordens I'm trying to communicate what I see to assist M-Labs in debugging this Issue. Your frequent use of language that implies that I am lazy and careless does little to encourage this type of constructive feedback.


Since this is an Issue on something not working it didn't occur to me to post an example of success. Perhaps doing so will help others know what to expect. Following the instructions on the mailing list, I wait for [ 0.028264s] INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready... and then load the RTM FPGA I see the following.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 25 50 38 62 47 62 32 40 done
Read delays: 7:00-128 6:01-153 5:48-187 4:56-196 3:94-225 2:101-236 1:115-258 0:130-266 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+636.gf97163cd
[     0.010129s]  INFO(runtime): gateware version 4.0.dev+636.gf97163cd
[     0.016395s]  INFO(runtime): log level set to INFO by default
[     0.022114s]  INFO(runtime): UART log level set to INFO by default
[     0.028264s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     8.592829s]  INFO(board_artiq::serwb): done.
[     8.595959s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+636.gf97163cd
[     8.603428s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     9.603005s]  INFO(runtime): continuing boot
[     9.605972s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 found
[     9.612120s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 configuration...
[     9.619353s]  INFO(board_artiq::hmc830_7043::hmc830): waiting for lock...
[    11.620010s] ERROR(board_artiq::hmc830_7043::hmc830): HMC830 lock timeout. Register dump:
[    11.626901s] ERROR(board_artiq::hmc830_7043::hmc830): [0x00] = 0xa7975
[    11.633411s] ERROR(board_artiq::hmc830_7043::hmc830): [0x01] = 0x0002
[    11.639837s] ERROR(board_artiq::hmc830_7043::hmc830): [0x02] = 0x0002
[    11.646262s] ERROR(board_artiq::hmc830_7043::hmc830): [0x03] = 0x0030
[    11.652688s] ERROR(board_artiq::hmc830_7043::hmc830): [0x04] = 0x0000
[    11.659114s] ERROR(board_artiq::hmc830_7043::hmc830): [0x05] = 0x0000
[    11.665539s] ERROR(board_artiq::hmc830_7043::hmc830): [0x06] = 0x303ca
[    11.672052s] ERROR(board_artiq::hmc830_7043::hmc830): [0x07] = 0x014d
[    11.678477s] ERROR(board_artiq::hmc830_7043::hmc830): [0x08] = 0xc1beff
[    11.685076s] ERROR(board_artiq::hmc830_7043::hmc830): [0x09] = 0x153fff
[    11.691676s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0a] = 0x2046
[    11.698101s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0b] = 0x7c061
[    11.704614s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0c] = 0x0000
[    11.711039s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0f] = 0x0081
[    11.717465s] ERROR(board_artiq::hmc830_7043::hmc830): [0x10] = 0x0080
[    11.723890s] ERROR(board_artiq::hmc830_7043::hmc830): [0x11] = 0x7ffff
[    11.730402s] ERROR(board_artiq::hmc830_7043::hmc830): [0x12] = 0x0000
[    11.736828s] ERROR(board_artiq::hmc830_7043::hmc830): [0x13] = 0x1259
panic at src/libcore/result.rs:906: cannot initialize HMC830/7043: "HMC830 lock timeout"
backtrace for software version 4.0.dev+636.gf97163cd:
0x4002398c
0x4004504c
0x400060c0
0x40002fa4
0x400236dc
halting.

I interpret this to mean that the HMC830 is blocking, an unrelated Issue.

enjoy-digital commented 6 years ago

serwb has been refactored (architecture and clocking). An issue has aslo been found with un-initialized HMC7043 (https://github.com/sinara-hw/sinara/issues/541) that could explain the issue. To prevent un-initialized HMC7043 to introduce noise in AMC FPGA, clock buffers are now disabled at startup. (https://github.com/m-labs/artiq/commit/8212e46f5ed97154b2f03010c17ac7a7fa98c4a2). Closing this since content if this issue is probably no longer relevant.

jbqubit commented 6 years ago

Thanks @enjoy-digital, @gkasprow !