Closed sbourdeauducq closed 5 years ago
@gkasprow Should I send you the board or is there something I can try to fix myself? This is happening very frequently now, and blocking the development of inter-board synchronization.
sure, send it to Technosystem
The other AMC also begins to exhibit this bug now :(
@gkasprow I'm still waiting for your detailed shipping instructions.
@sbourdeauducq send it to TechnoSystems via any respectable carrier (FedEx/DHL/etc). I've never had an issue doing that.
Let's see after Brexit :)
Anyway, I received the information and will send it with DHL tomorrow.
NB for anyone who wants to debug this: looking at Exar's "UnivPMIC" the XR77129 register layout might be more or less the same as the XRP7724 which is also documented and has some examples
XR77129 is the same as 7725 but has higher voltage rating. 7725 supports someIntel management protocol that 7724 does not.
While waiting for Sayma to ship to HK, can M-Labs do testing via remote login to WUT system?
We can make it quickly tomorrow using TeamViewer.
using TeamViewer
That won't be useful, we need SSH or (better) Mosh. Anyway I don't think it's worth it to access the WUT boards remotely, DHL between Poland and HK is rather fast.
I have 2 VPN accounts so we can do it.
@gkasprow How would @sbourdeauducq disable on-board 3.3V supply and supply 3.3V from bench top PSU? Is this even a good idea?
One doesnt have to disable. He can connect external power module or bench supply in parallel. Exar should simply disable the channel.
This sounds like a bad idea, e.g. if the exar chip disabled the channel due to overcurrent.
Set current limit on your benchtop supply.
All the lab power supplies that I have turn into current sources when the max current is reached. And we are typically leaving the board on all the time. So, unless I change the behavior of the PSU to make it work like a circuit breaker, the board might keep receiving its maximum current for days, which does not sound safe. And we do not currently have time or funding for this sort of Sayma hardware debugging. As I mentioned in another issue, the board that is left in HK isn't strongly affected by this bug yet; it typically behaves itself for 30min-1 day after being turned on. So it's not a huge impediment to development (except for inter-board synchronization, since this is the only board we have left), but since likely this bug will get worse, please investigate it quickly after receiving the board @gkasprow.
the board might keep receiving its maximum current for days, which does not sound safe.
There's no need for fuse/breaker when using a current limited supply. If board nominal current draw is X set current limit to (1+eps)*X for whatever eps is safe CW. But sounds like your board's supply isn't bad enough yet to warrant using a bench top power supply.
@gkasprow Even though I used DHL as you instructed, the package is again stuck in Polish customs. Can you handle the import this time please? "Clearance will proceed after receiving instructions from the importer. Customer should contact DHL Customer Service if not reached by DHL"
Still stuck in customs...
:(
Do you use cocaine for padding when you ship them? Maybe EU customs are just suspicious of HK
This time it stucked in DHL, not polish post and this is huge difference :)
Still stuck despite new paperwork sent yesterday.
And in my experience, EU customs are suspicious of many things coming from small organizations outside EU; it is also a problem to receive most items from e.g. small US organizations into Germany or France. "Respectable carriers", as you call them, also exploit the customs mess to make money in pretty shady ways, see the end of http://www.minimachines.net/a-la-une/la-livraison-depuis-lasie-delais-prix-transporteurs-57204.
Customs released it, hallelujah!
@gkasprow Have you received the board?
@sbourdeauducq Yes, I received it a few minutes ago. Will investigate them tomorrow.
@sbourdeauducq the AMC is working stand-alone running ARTIQ together with RTM already 2 hours and nothing happens to the supply...
When I received it, the Exar chip started after a few seconds. I burned recent MMC firmware and so far it works.
The configuration of both Exar chips is also fine.
Try the other board I sent you and which Technosystem received today. It also has this 3.3V bug. Is there a new MMC firmware? I thought I had flashed the latest one already. Why did it take a few seconds to start the Exar chip?
Sometimes (but more rarely) the 1.5V supply also fails.
It looks like there was old version of firmware which was waiting for MCH response
I will leave it over night and check tomorrow morning.
@sbourdeauducq I took the RTM you shipped back with annotation that HMC does not lock:
Booting from flash... Starting firmware. [ 0.000004s] INFO(satman): ARTIQ satellite manager starting... [ 0.005668s] INFO(satman): software version 4.0.dev+1219.g4eb26c00 [ 0.011930s] INFO(satman): gateware version 4.0.dev+1214.g729ce58f [ 0.018172s] INFO(board_artiq::slave_fpga): Loading slave FPGA gateware... [ 0.025120s] INFO(board_artiq::slave_fpga): magic: 0x5352544d, length: 0x000c15b4 [ 1.038593s] INFO(board_artiq::slave_fpga): ...done [ 1.042339s] INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready... [ 1.077067s] INFO(board_artiq::serwb): ...done. [ 1.080377s] INFO(board_artiq::serwb): RTM to AMC link test... [ 2.562678s] INFO(board_artiq::serwb): ...passed [ 2.566161s] INFO(board_artiq::serwb): AMC to RTM link test... [ 4.048469s] INFO(board_artiq::serwb): ...passed [ 4.051961s] INFO(board_artiq::serwb): Wishbone test... [ 5.983985s] INFO(board_artiq::serwb): ...passed [ 5.987468s] DEBUG(board_artiq::serwb): AMC serwb settings: [ 5.993025s] DEBUG(board_artiq::serwb): bitslip: 39 [ 5.998062s] DEBUG(board_artiq::serwb): ready: 1 [ 6.002837s] DEBUG(board_artiq::serwb): error: 0 [ 6.007613s] DEBUG(board_artiq::serwb): RTM serwb settings: [ 6.013178s] DEBUG(board_artiq::serwb): bitslip: 6 [ 6.018128s] DEBUG(board_artiq::serwb): ready: 1 [ 6.022904s] DEBUG(board_artiq::serwb): error: 0 [ 6.027912s] INFO(board_artiq::serwb): RTM gateware version 4.0.dev+1214.g729ce58f [ 6.295633s] INFO(board_artiq::si5324): waiting for Si5324 lock... [ 8.726003s] INFO(board_artiq::si5324): ...locked [ 8.729686s] INFO(board_artiq::hmc830_7043::hmc830): HMC830 found [ 8.735754s] INFO(board_artiq::hmc830_7043::hmc830): loading HMC830 configuration... [ 8.743797s] INFO(board_artiq::hmc830_7043::hmc830): ...done [ 8.749458s] INFO(board_artiq::hmc830_7043::hmc830): setting HMC830 dividers... [ 8.756919s] INFO(board_artiq::hmc830_7043::hmc830): ...done [ 8.762743s] INFO(board_artiq::hmc830_7043::hmc830): waiting for HMC830 lock... [ 8.770161s] INFO(board_artiq::hmc830_7043::hmc830): ...locked [ 8.776390s] INFO(board_artiq::hmc830_7043::hmc7043): enabling HMC7043 [ 8.793075s] INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 found [ 8.798051s] INFO(board_artiq::hmc830_7043::hmc7043): loading configuration... [ 8.806895s] INFO(board_artiq::hmc830_7043::hmc7043): status=10 [ 8.811508s] INFO(board_artiq::hmc830_7043::hmc7043): ...done [ 8.817502s] INFO(board_artiq::hmc542): card 0 channel 0 set to 4 dB [ 8.826008s] INFO(board_artiq::hmc542): card 0 channel 1 set to 4 dB [ 8.833138s] INFO(board_artiq::hmc542): card 1 channel 0 set to 4 dB [ 8.840267s] INFO(board_artiq::hmc542): card 1 channel 1 set to 4 dB [ 8.847398s] INFO(board_artiq::hmc542): card 2 channel 0 set to 4 dB [ 8.854527s] INFO(board_artiq::hmc542): card 2 channel 1 set to 4 dB [ 8.861658s] INFO(board_artiq::hmc542): card 3 channel 0 set to 4 dB [ 8.868787s] INFO(board_artiq::hmc542): card 3 channel 1 set to 4 dB
what power level of 100MHz clock do you use?
And both AMC board you shipped to me (the ones covered with thick layer of dust :) ) don't have Ethernet clock line modification. So it's not surprise that Ethernet does not work. On of them has sticker saying that there is 1.5V bug another has 3.3V bug sticker. The one with 1.5V bug has PRBS issues.
But from power supply point of view, they are working fine. What power supply do you use?
Anyway, I will focus on GTP2 clock on this particular board.
on GTP CLK2 the DC component is 0.7V while on GTP CLK1 it is 0.4V. The DC value is set by FPGA due to capacitive coupling.
is this some configuration issue with ARTIQ, or a hardware issue? Can we reproduce that observation with a simple design based on Xilinx IP?
The datasheet , p33 says The reference clock input structure is illustrated in Figure 2-1. The input is terminated internally with 50Ω on each leg to 4/5 MGTAVCC. The reference clock is instantiated in software with the IBUFDS_GTE2 software primitive. The ports and attributes controlling the reference clock input are tied to the IBUFDS_GTE2 software primitive.
So neither of these values I measured makes sense. I see only one option - modify my design that tests gigabit transceivers, instantiate clock inputs by Wizard and repeat measurements
The data sheet you link to isn't for ultrascale. That's https://www.xilinx.com/support/documentation/user_guides/ug576-ultrascale-gth-transceivers.pdf see figure 2-1
Are you sure that all the MGTAVCC pins are connected correctly, that MGTAVCC has the right voltage and that there are no nasty transients on it during startup?
@gkasprow if you disconnect the HMC7043 what DC voltages do you measure on these clock inputs?
Might also be interesting to stick a scope on those inputs and look at the DC voltage as Sayma boots up
It is still 4/5 AVCC. MGTAVCC is filtered VCCINT which I observed several times. Plan for today:
@gkasprow @hartytp Please keep this issue on the 3.3V power supply failure topic.
But from power supply point of view, they are working fine. What power supply do you use?
ATX.
@gkasprow This is off topic here.
One of the Sayma AMC cards (F) has an intermittent 3.3V problem. The 3.3V rail fails between <1s and hours after power-up. Exar dump from one incident when it failed shortly after startup: