m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
425 stars 196 forks source link

the kc705 board memory bar may be broken #968

Closed mingshenli closed 2 years ago

mingshenli commented 6 years ago

the kc705 board can not function and it is believed that the memory bar is broken. However, we change a new 2G memory but it still can not be connect. @sbourdeauducq

the above is new and below is the original one. 477780476110960315

538664690862815004

KaifengC commented 6 years ago

We have the same problem with some of our KC705 boards.

At the beginning we guess it's the same of #525 . But even following the discussion we can't solve it.

I was wondering is that a memory bar problem? Did your board work well before it broken?

sbourdeauducq commented 6 years ago

@KaifengC Can you post the full log? Also please try with ARTIQ-4 which has improved memory support (plus prints more detailed logs).

KaifengC commented 6 years ago

There is nothing new. The gateware is artiq 2.x. Rarely (<5%) after restart it will go through the Memory initialization part and works well. But even in these cases, the board will lose its response at any time during working.

MiSoC BIOS
(c) Copyright 2007-2016 M-Labs Limited
Built Oct  2 2017 13:03:47

BIOS CRC passed (5c1c54fe)
Initializing SDRAM...
Write leveling: 15* 17* 13  15*  9   8   5   7  completed
Read bitslip: 7 6 5 4 3 2
Read delays: 7:03-13  6:02-14  5:05-15  4:06-16  3:12-22  2:11-21  1:00-11  0:01-11  completed
Memtest failed: 29181/532736 words incorrect
Memory initialization failed
BIOS>

Talking about upgrading ARTIQ-4, we are encountering another problem #984 .

mingshenli commented 6 years ago

we send the board back to xlinx, but they said that the memory bar is ok. we are still trying to find the problem.

KaifengC commented 6 years ago

Tried it using artiq 4.0.dev and got more information via serial port:

 __  __ _ ____         ____                                                     
|  \/  (_) ___|  ___  / ___|                                                    
| |\/| | \___ \ / _ \| |                                                        
| |  | | |___) | (_) | |___                                                     
|_|  |_|_|____/ \___/ \____|                                                    

MiSoC Bootloader                                                                
Copyright (c) 2017-2018 M-Labs Limited                                          

Bootloader CRC passed                                                           
Gateware ident 4.0.dev+820.gbb90fb7d                                            
Initializing SDRAM...                                                           
Write leveling scan:                                                            
Module 7:                                                                       
00000001111111111111000000000000                                                
Module 6:                                                                       
00000111111111111110000000000000                                                
Module 5:                                                                       
00000000111111111111110000110000                                                
Module 4:                                                                       
00000000011111111111100000110000                                                
Module 3:                                                                       
10000000000000011111111111111111                                                
Module 2:                                                                       
00000000000001111111111111111111                                                
Module 1:                                                                       
11110000000000000111111111111111                                                
Module 0:                                                                       
11000000000000011111111111111111                                                
Write leveling: 15* 17* 13 15* 9 8 5 7 done                                     
Read bitslip: 7 6 5 4 3 2                                                       
Read leveling scan:                                                             
Module 7:                                                                       
00111111111110000000000000000000                                                
Module 6:                                                                       
00111111111111000000000000000000                                                
Module 5:                                                                       
00000111111111100000000000000000                                                
Module 4:                                                                       
00000111111111111000000000000000                                                
Module 3:                                                                       
00000000000111111111110000000000                                                
Module 2:                                                                       
00000000000111111111000000000000                                                
Module 1:                                                                       
11111111110000000000000000000001                                                
Module 0:                                                                       
01111111110000000000000000000000                                                
Read leveling: 6+-5 7+-5 9+-5 10+-5 16+-5 15+-4 4+-5 5+-4 done                  
SDRAM initialized                                                               
Memory test failed (2075780/4458496 words incorrect)                            
Halting. 

I will install well-tested memory bar on it and try again.

sbourdeauducq commented 6 years ago

The write leveling scans look unusual (but do not indicate broken hardware). Maybe the algo does not handle those corner cases correctly.

KaifengC commented 6 years ago

Yes, it's the memory bar's problem.

I exchanged the memory bar of this board with another one taken from a well-working board. It worked, and the "well-working board" can't go through the memory test now.

It strange that all this two boards are almost new. I can't figure out any difference using my eyes except for the SN number.

KaifengC commented 6 years ago

By the way, it seems the artiq_flash command has changed in artiq 4.0.dev. So how do I set the ip/mac address now? The -m and proxy options are not working now.

sbourdeauducq commented 6 years ago

-m is renamed -V (see release notes) and proxy is automatic (you don't need to specify it manually anymore).

sbourdeauducq commented 6 years ago

Yes, it's the memory bar's problem.

DDR memory systems have a lot of board-to-board variation, this is why we have this calibration algorithm that runs at board startup. I suspect that the non-working memory module can be made to work by debugging and improving the algorithm: https://github.com/m-labs/artiq/blob/master/artiq/firmware/libboard/sdram.rs

KaifengC commented 6 years ago

Okay, shall I send you one of this kind of non-working memory bar?

gkasprow commented 6 years ago

Did you try to run xilinx reference design on this board?

KaifengC commented 6 years ago

Did you try to run xilinx reference design on this board?

Yes, an engineer from Xilinx came to my lab and tested the board. He told me that the board was completely normal.