Closed mingshenli closed 2 years ago
We have the same problem with some of our KC705 boards.
At the beginning we guess it's the same of #525 . But even following the discussion we can't solve it.
I was wondering is that a memory bar problem? Did your board work well before it broken?
@KaifengC Can you post the full log? Also please try with ARTIQ-4 which has improved memory support (plus prints more detailed logs).
There is nothing new. The gateware is artiq 2.x.
Rarely (<5%) after restart it will go through the Memory initialization
part and works well.
But even in these cases, the board will lose its response at any time during working.
MiSoC BIOS
(c) Copyright 2007-2016 M-Labs Limited
Built Oct 2 2017 13:03:47
BIOS CRC passed (5c1c54fe)
Initializing SDRAM...
Write leveling: 15* 17* 13 15* 9 8 5 7 completed
Read bitslip: 7 6 5 4 3 2
Read delays: 7:03-13 6:02-14 5:05-15 4:06-16 3:12-22 2:11-21 1:00-11 0:01-11 completed
Memtest failed: 29181/532736 words incorrect
Memory initialization failed
BIOS>
Talking about upgrading ARTIQ-4, we are encountering another problem #984 .
we send the board back to xlinx, but they said that the memory bar is ok. we are still trying to find the problem.
Tried it using artiq 4.0.dev and got more information via serial port:
__ __ _ ____ ____
| \/ (_) ___| ___ / ___|
| |\/| | \___ \ / _ \| |
| | | | |___) | (_) | |___
|_| |_|_|____/ \___/ \____|
MiSoC Bootloader
Copyright (c) 2017-2018 M-Labs Limited
Bootloader CRC passed
Gateware ident 4.0.dev+820.gbb90fb7d
Initializing SDRAM...
Write leveling scan:
Module 7:
00000001111111111111000000000000
Module 6:
00000111111111111110000000000000
Module 5:
00000000111111111111110000110000
Module 4:
00000000011111111111100000110000
Module 3:
10000000000000011111111111111111
Module 2:
00000000000001111111111111111111
Module 1:
11110000000000000111111111111111
Module 0:
11000000000000011111111111111111
Write leveling: 15* 17* 13 15* 9 8 5 7 done
Read bitslip: 7 6 5 4 3 2
Read leveling scan:
Module 7:
00111111111110000000000000000000
Module 6:
00111111111111000000000000000000
Module 5:
00000111111111100000000000000000
Module 4:
00000111111111111000000000000000
Module 3:
00000000000111111111110000000000
Module 2:
00000000000111111111000000000000
Module 1:
11111111110000000000000000000001
Module 0:
01111111110000000000000000000000
Read leveling: 6+-5 7+-5 9+-5 10+-5 16+-5 15+-4 4+-5 5+-4 done
SDRAM initialized
Memory test failed (2075780/4458496 words incorrect)
Halting.
I will install well-tested memory bar on it and try again.
The write leveling scans look unusual (but do not indicate broken hardware). Maybe the algo does not handle those corner cases correctly.
Yes, it's the memory bar's problem.
I exchanged the memory bar of this board with another one taken from a well-working board. It worked, and the "well-working board" can't go through the memory test now.
It strange that all this two boards are almost new. I can't figure out any difference using my eyes except for the SN number.
By the way, it seems the artiq_flash
command has changed in artiq 4.0.dev.
So how do I set the ip/mac address now?
The -m
and proxy
options are not working now.
-m
is renamed -V
(see release notes) and proxy
is automatic (you don't need to specify it manually anymore).
Yes, it's the memory bar's problem.
DDR memory systems have a lot of board-to-board variation, this is why we have this calibration algorithm that runs at board startup. I suspect that the non-working memory module can be made to work by debugging and improving the algorithm: https://github.com/m-labs/artiq/blob/master/artiq/firmware/libboard/sdram.rs
Okay, shall I send you one of this kind of non-working memory bar?
Did you try to run xilinx reference design on this board?
Did you try to run xilinx reference design on this board?
Yes, an engineer from Xilinx came to my lab and tested the board. He told me that the board was completely normal.
the kc705 board can not function and it is believed that the memory bar is broken. However, we change a new 2G memory but it still can not be connect. @sbourdeauducq
the above is new and below is the original one.