sinara-hw / sinara

Sayma AMC/RTM issue tracker
Other
42 stars 7 forks source link

Sayam: startup issues #554

Closed hartytp closed 6 years ago

hartytp commented 6 years ago

Building with current ARTIQ master (AMC=4.0.dev+1108.gbb87976d, RTM=4.0.dev+1108.gbb87976d) using vivado 2018.1. Full build environment: https://hastebin.com/etuyayozug.pas

Edit: OpenOCD tells me that my RTM is at 42.6C and my AMC is at 33.5C (https://hastebin.com/zumuqenibi.pas)

Clean 100MHz reference (+6dB) provided from a synth.

I've seen one of two issues on startup:

Currently, I haven't got beyond that. I'll have a look at the HMC830 output to see if I can find a reason for the PLL not locking...

hartytp commented 6 years ago

My 100MHz reference, measured with a 20dB coupler 20180604_111858

150MHz FPGA clock, measured on J61 20180604_113129 20180604_112832

1.26GHz HMC830 output measured on J58 20180604_114106

All in all, I'd say there are no large noise issues on my clocks once the HMC830/7043 are correctly configured...

enjoy-digital commented 6 years ago

@hartytp: good. Are you measuring that when PLL fails to lock or on a after a crash during the HMC7043 configuration?

hartytp commented 6 years ago

both. I didn't see any difference between the two cases.

enjoy-digital commented 6 years ago

While measuring, are you also able to see the broadband noise @gkasprow noticed before the hmc7043 initialization?

hartytp commented 6 years ago

just about to post that...sec

hartytp commented 6 years ago

J61 during boot:

20180604_120302

after boot: 20180604_120331

Which explains some of the issues we had...

hartytp commented 6 years ago

Rebuilt without sawg, and seeing the same two issues on startup.

hartytp commented 6 years ago

Actually, without sawg, I've also seen a few instances of it freezing on AMC to RTM link test (no errors reported).

enjoy-digital commented 6 years ago

Indeed... thanks for the measures. So now to understand the freeze during/after hmc7043 configuration, can you add some prints/delays to the hmc7043 configuration (maybe in the write access) to be able to see if it always fails at the same access?

hartytp commented 6 years ago

note that the error is always after the HMC7043 configuration: it's after "....done" but before the next line printed to the UART... Remind me where you want me to add the debug prints? (remote debugging here would be very useful)

hartytp commented 6 years ago

Comments:

hartytp commented 6 years ago

I'm beginning to wonder if some of the intermittent/board-board variations we have are due to some piece of rework failing.

In any case, my board just isn't working at all now: either serwb fails to init or the HMC830 identifies as 0x00.

So, there doesn't seem to be much I can do on any front right now. I could send my board back to @gkasprow for investigation, but it might be better to just focus on the boards that work and push towards v2.0.

jordens commented 6 years ago

The robust solution would seem to be to put a reasonably strong pullup on that hmc7043 reset (and only that one) that won't get lowered accidentally during any other state and only drive that low when the firmware is certain the input is fine.

hartytp commented 6 years ago

@jordens yes, I was about to suggest that as well. I think there should also be one on the HMC830 CS line to prevent it from getting stuck in the wrong SPI mode.

hartytp commented 6 years ago

So, I did the following

After more than 10 restarts, I haven't seen a single crash (but many serwb init failures).

hartytp commented 6 years ago

I also haven't seen any SERDES PLL lock failures with the new init sequence.

jbqubit commented 6 years ago

I've not seen serwb init failures in the last week at UMD. Spanning > 100 reboots.

hartytp commented 6 years ago

After fixing the HMC7043 init, I only see serwb init issues.

hartytp commented 6 years ago

After the latest round of fixes and using @enjoy-digital's fixed 1gsps line rate serwb, I've had 22 successful inits in a row and counting (but, not counting JESD init failures).

So, Sayma not boots properly! Whoo!