sinara-hw / sinara

Sayma AMC/RTM issue tracker
Other
42 stars 7 forks source link

Sayma rtm intermittent start-up #610

Closed vmsch closed 5 years ago

vmsch commented 5 years ago

Our Sayma RTM (the one previously used by @jbqubit 20171201a) is only intermittently starting up correctly, about 1 in 4 power cycles, the other times the red bottom LED is red.

It is powered by a microTCA crate. Originally we had the same problems as issue 571, i.e. the AMC would only start up if the RTM wasn't plugged in, and following https://github.com/sinara-hw/sinara/issues/571#issuecomment-441328494 I shorted the pins on T14 and T3 on the AMC. After that the AMC powered up correctly also with the RTM plugged in, but the RTM would not power up correctly. After also shorting T5 the RTM also started up for the first time. Now both the AMC and RTM also show up on nat (independently of the red LED being on), which previously wasn't the case:

nat> show_fru

FRU Information:
----------------
 FRU  Device   State  Name
==========================================
  0   MCH       M4    NMCH-CM
  3   mcmc1     M4    NAT-MCH-MCMC
  5   AMC1      M4    Sayma
 40   CU1       M4    Schroff uTCA CU
 41   CU2       M4    Schroff uTCA CU
 51   PM2       M4    NAT-PM-AC1000
 90   AMC1-RTM  M4    Sayma-RTM
==========================================

However, most of the time the RTM starts up into a state where the back bottom LED is red. Sometimes it starts up into a state where that LED is off, and LED-wise everything seems fine.

However in neither state can we flash the Sayma gateware (see comment to issue 562).

What other issues could cause this, and if an MMC firmware update might help, where can I obtain the firmware from?

Did @jbqubit use these boards in a microTCA crate, and if yes, did you have any problems?

vmsch commented 5 years ago

@gkasprow Is there somewhere a summary of which LEDs should be on in which colour/blinking/off-on mode for correct operation of the RTM and AMC? For example I can't find in the manual what colour LD2/U2 on RTM should be in for proper operation. I thought red would be bad, but now when I power it externally I could flash the gateware with the LED being red.

If there isn't a summary yet, would you mind writing me one, please, so I can debug our boards correctly?

gkasprow commented 5 years ago

AMC LEDs: AMC requires all LEDS: LD7, LD8, LD9, LD10, LD22, LD11 to be on. It means all supplies are correct. These LEDS are placed close to the DC/DC converters. When MMC is running, you should observe alternately blinking LEDs on the front panel. When RTM is plugged, AMC detects it and enables P3V3MP and P12V0_RTM which is signalized by LD17, installed close to the RTM connector on AMC. LD6 means Ethernet Link Up.

RTM LEDs.

When MMC initializes RTM, LD6, LD7, LD8, LD9 and LD13 must be on. They are scattered across the RTM board. When LD6 or LD7 or LD4 are off, make sure that T3 and T7 are fine. One of them is placed close to the PCB corner and may be damaged during board insertion. These are supplies that deliver power to digital part of the RTM. LD3, LD4, LD5, LD12, LD10, LD11 notify so called analog voltages.

LD15 on RTM means board overheating. LD14 means POWER GOOD

vmsch commented 5 years ago

Thank you. In the RTM manual latex files it says on the back panel LED ( https://github.com/sinara-hw/sinara/blob/cdcb3936a60ff7027d9586707261f99b03656148/ARTIQ_EE/PCB_Sayma_RTM/Manual/tex/panel.tex )

Callout 18: U2 is off if RTM is ok
red if RTM is in an error state

What kind of error is this? And what could be possible reasons/ how can I fix this?

gkasprow commented 5 years ago

I'm not sure what the function of the U2 LED is. According to MTCA specification, it should indicate an error. @wizath what functionality does the RTM panel red led have? It is connected to the P2 output of the I2C extender. Anyway, if the RTM does not start when plugged to AMC, it could be related with missing or broken HW fix applied to the 3V3 MP inrush current limiter. Please check if the board has added a capacitor and resistor to the T14 MOSFET. We observed that missing inrush current causes similar issues in our lab.

wizath commented 5 years ago

@gkasprow No idea, it's managed by openMMC internals