sinara-hw / meta

Meta-Project for Sinara: Wiki, inter-board design, incubator for new projects
51 stars 6 forks source link

Clock recovery in Sinara/ARTIQ #15

Closed hartytp closed 5 years ago

hartytp commented 6 years ago

Moving from https://github.com/sinara-hw/sinara/issues/515#issuecomment-416786782

Putting out our thought process so far in detail for anyone who's interested:

Current situation in ARTIQ:

So, we are considering implementing something closer to the CDR element White Rabbit, using the DDMTD technique. Comments about this:

So, our plan is:

gkasprow commented 5 years ago
let's connect the external clock input to the FPGA via a suitable balun like TCM2-43X+ that goes down to 10MHz, but also works well for clocks with fast rising edges

I already used LVDS reciever: obraz If it is not enough, I can add balun.

Why is the XTAL net labelled SI5328?

Oh, I copied it from Xilinx devkit, fixed

What is the XTAL PN? Should be annotated on the schematic and should match Kasli

CS-023-114.285M, it's same as on Kasli.

let's add the usual test points on power, I2C etc

done.

on Kasli we didn't bother connecting SI5324 INT_C1B do we need to connect this,

It is and it was already connected to the FPGA

can the FPGA handle the LVPECL signals directly, or do we need to add some attenuation?

It depends on power supply levels. With 1V5 it's hard, but 2.5V can do that, providing that is AC coupled. LVPECL levels are:

where are you sourcing the SI549 from? And, just double checking, but are you sure that the LVCMOS model is really that much cheaper than the LVDS/CML/LVPECL models? Also, are you able to find this in stock from somewhere?

I looked at mouser.com. 549CBAC000112ABG is 18EUR@50pcs. It has CMOS output 549ABAA000112ABG is 68EUR@50pcs. It has LVPECL outputs

NB the temperature stability of the chip is important, since it allows air currents to lead to phase drifts in the DCXO output. Since the WR loop BW isn't that high, we want to make sure the temp co is reasonably low. You've chosen the B-grade (10ppm) SI549, which is a good choice IMHO: the 7ppm C-grade only offers a small improvement.

There is limited choice of these oscillators

AFAICT, the Si570 can be used as a replacement for the the Si549 in this design. Is that correct? If so, please add an annotation to that effect. The only thing that needs to change for the SI570 AFAICT is that the OE is pin 2, so let's add a 0R resistors that are used to connect OE to either pin 1 or 2.

done

IIRC the Si570 is cheaper than the Si549 and is suitable for the helper DCXO. Did you decide that it's more important to minimise the number of BOM lines than to use cheaper components in this design (I don't mind, I'm just curious).

It's 18EUR chip. Hard to get cheaper DCXO

let's have a pair of UFL connectors on the helper PLL to aid debugging

I added them to the oscillator output

Do you need separate footprints for the two DCXOs? Aren't they footprint compatible (@gkasprow please check and confirm this), so a single footprint can be used for both?

They could be mounted on same footprint but they differ slightly. From assembly and management point of view it's better to use dedicated footprints.

why is there a 0R resistor to bypass the ac coupling cap on the helper DCXO output, but not for the main DCXO? Is this really necessary for either?

For helper we will use single ended LVCMOS input in FPGA while for main DCXO we will use AC coupled input to Si5324

@gkasprow the schematic symbol you have there is only compatible with the LMK61E2, not with the LMK61E07 AFAICT. This should be noted on the schematic. @WeiDaZhang are you happy with this oscillator, or do you prefer the LMK61E07?

I used one that have in my libs since it will be DNP by default. True, they have different footprints, will fix it.

@gkasprow @sbourdeauducq @WeiDaZhang what are your thoughts about which FPGA pins to use for the WR PLL? Ideally, I think we should keep the DCXO inputs as close to the transceiver clock recovery circuitry as possible to keep the two DDMTD FFs close together.

we are limited with placement of HP banks and CC pins. bank66 is millimetres away from transceivers, at least in terms of footprint dimensions. Can't be closer obraz

@gkasprow the schematics you posted don't show the FPGA MGTREF clocks. For Sayma and Metlino, someone must check that we drive the required MGTREFs so that all transceivers can be clocked correctly (IIRC there are some quite tight constraints about how the transceivers must be clocked).

The CDR clock enters CLK0 of bank224 There is another input that is routed to 200MHz LVPECL oscillator

schematic cosmetics need some work.

This is artefact of PDF conversion. The same schematic: obraz obraz

hartytp commented 5 years ago

If it is not enough, I can add balun.

I'd add the balun to improve the common-mode rejection. We can do something like Kasli where we add 0R pads to allow the balun to be DNFd if desired.

I will add series resistors at the LVPECL outputs.

On Sayma/Metlino we could also just replace the LVPECL buffer with a decent LVDS buffer. IIRC the main reason for using LVPECL was Kasli where we need something powerful enough to drive coax. However, if we do decide to go to a LVDS buffer then it must have properly specified phase noise and propagation delay temperature coefficient. IIRC there aren't many options for that, so it's probably easiest to just stick with the LVPECL buffer.

549CBAC000112ABG is 18EUR@50pcs. It has CMOS output

@WeiDaZhang can you order an eval board for the CMOS Si549 and measure it's phase noise? We should check that this has comparable phase noise to the LVDS/LVPECL parts.

@gkasprow FWIW, there are LVDS/LVPECL parts at 16EUR/pc on Mouser as well, so long as we don't mind going for the A-grade stability (+-20ppm). @WeiDaZhang What part number did you use in your tests so far?

They could be mounted on same footprint but they differ slightly. From assembly and management point of view it's better to use dedicated footprints.

Okay. Will we have space on Kasli for an additional 4 DCXO footprints? Might be better to just agree to use the Si549 for this design revision and remove the LMK for now.

we are limited with placement of HP banks and CC pins. bank66 is millimetres away from transceivers, at least in terms of footprint dimensions. Can't be closer The CDR clock enters CLK0 of bank224 There is another input that is routed to 200MHz LVPECL oscillator

ACK. I'd like to hear from @sbourdeauducq and @WeiDaZhang about their thoughts on the FPGA connections, but that sounds fine to me.

WeiDaZhang commented 5 years ago

The Si549 I've been using is 549BACB001937ABG. BTW, the LVCMOS output ones seem only available on order of 50 pcs, which we weren't able to try.

WeiDaZhang commented 5 years ago

FYI, I've been running a test of over the weekend to see if the FPGA is loaded up to 75 % affects the stability of the WR. The result shows the stability of loaded FPGA (-o-) is not worse (within a few dBs) than the stability of the unloaded FPGA test. image measured modified allan deviation between 2 ddmtds w_ mgtrefclk input or normal ios or fpga loaded and on reference channel instrument

hartytp commented 5 years ago

The result shows the stability of loaded FPGA (-o-) is not worse (within a few dBs) than the stability of the unloaded FPGA test.

Good to know!

The Si549 I've been using is 549BACB001937ABG.

Okay, good. This is the A grade with the worst temperature stability. The good results have have suggest that the temp co of the DCXO isn't crucial. If we are happy using the A grade then there are LVDS and LVPECL options available at the same cost as the LVMCOS. See here: https://www.mouser.co.uk/Silicon-Laboratories/Passive-Components/Frequency-Control-Timing-Devices/Oscillators/Programmable-Oscillators/Si549-Series/_/N-7jdx1?P=1z0zp1dZ1yzuw4vZ1yzsjssZ1yzv8svZ1yzv8suZ1y98pm2&Ns=Pricing|0

e.g. 549AAAA000112ABG or 549BACB000118BBG

NB to give ourselves more options, let's definitely connect OE to both pins 1 and pins 2, using 0R resistors as solder jumpers (DNF the appropriate resistor for the device model we're using).

hartytp commented 5 years ago

The good results have have suggest that the temp co of the DCXO isn't crucial.

One comment about this: unlike the CERN WR implementation, Weida is using a 3rd order loop filter, which is better are rejecting phase/temperature fluctuations in the DCXO. We haven't investigated long-term stability v loop filter order, but it's possible that the DCXO temp co would be more of an issue if combined with a lower order loop filter...

gkasprow commented 5 years ago

@hartytp does the helper input have to be clock capable? In WR design it doesn't. In Metlino FPGA I don't have free 2 CC inputs... The same applies to ref clock input. I can connect it to regular IO, which can be forwarded to PD and outputs for Si5324. Alternatively, if they must be CC, I can switch one HR bank supply to 2.5V, add one translator and use up to 4 CC inputs.

hartytp commented 5 years ago

@hartytp does the helper input have to be clock capable? In WR design it doesn't.

When you say "helper" do you mean the DCXO that supplies the clock for the DDMTD? i.e. not the main DCXO and not the ref (SMA) clock input?


@WeiDaZhang ^

We do need to get the helper clock onto the local clock tree, which would normally be done using a CC (not GC) pin, wouldn't it? I assumed that a CC pin was used for the helper clock in WR as well.

The same applies to ref clock input. I can connect it to regular IO, which can be forwarded to PD and outputs for Si5324.

The ref clock does not need to be a CC pin so long as we can forward it to the Si5324 without adding too much noise

gkasprow commented 5 years ago

OK, in other WR designs I did it was routed to CC pin. I'm pretty sure that somebody from WR commmunity told mi that it is not strict requirement. I managed to shift lines and get two GC pins close to each other.

gkasprow commented 5 years ago

OK, in other WR designs I did it was routed to CC pin. I'm pretty sure that somebody from WR commmunity told mi that it is not strict requirement. I managed to shift lines and get two GC pins close to each other and GC for ref clock as well.

hartytp commented 5 years ago

@gkasprow GC is definitely not required.

Not sure if CC pin is required. The DDMTD noise doesn't depend strongly on the helper noise, but if it gets too bad this can be a problem. I'd like to keep the helper pin CC if possible. CC/GC is not needed for ref/main DCXO.

jordens commented 5 years ago

DDmtd is symmetric. You can sample the helper with the both the recovered and the main clock or (classic) sample both recovered and main with the helper. The first should work, may need fewer GC inputs in our case but might be one more internal CDC.

hartytp commented 5 years ago

@gkasprow see this comment. Please can we use the LVDS part instead of the LVCMOS. It's the same cost, but should have reduced sensitivity to supply noise etc.

gkasprow commented 5 years ago

The ones with LVDS outputs and same cost are not available. Available LVDS ones are 42$ per piece.

hartytp commented 5 years ago

Aah, you're right, I'd misread that. Okay, let's stick with LVCMOS for now. If the performance is bad then we can change the population option in a later revision.

If we produce a large enough batch of boards with Si549 WR then we should consider ordering a batch of LVDS DCXOs from SI.

WeiDaZhang commented 5 years ago

@gkasprow GC is definitely not required.

Not sure if CC pin is required. The DDMTD noise doesn't depend strongly on the helper noise, but if it gets too bad this can be a problem. I'd like to keep the helper pin CC if possible. CC/GC is not needed for ref/main DCXO.

The scenarios which have been tested by us, always route the HELPER to a CC pair. As @gkasprow said, it is not strictly required, but there is a bit of WR logic runs on it, a BUFx is needed and CC is, therefore, say "recommended". The other two, REF and MAIN doesn't require CC from the WR point of view. However, if there is some logic runs on the WR recovered clock - CDR_CLK_CLEANx - in the FPGA, which I assume that is normal, the clock should enter the FPGA in one of the CC/GC/MGTREFCLK ways.

gkasprow commented 5 years ago

Thanks. So in theory it should be sufficient to route the clock to MGTREFCLK only? No clock copy required to GC ?

WeiDaZhang commented 5 years ago

image Though depends on FPGA types and applications that CPLL or QPLL is used, I think MGTREFCLK can always route to fabric and drive BUFG. I've tried this on KC705 before with at least one of the CPLL/QPLL. So in THEORY, it should. I can't think of a use-case which we have to have the CDR_CLK_CLEANx enters the FPGA in more than:

WeiDaZhang commented 5 years ago

image This one is probably clearer.

gkasprow commented 5 years ago

I asked just for curiosity. I connected both MGTREFCLK and GC.

hartytp commented 5 years ago

Right @WeiDaZhang.

hartytp commented 5 years ago

@jordens @sbourdeauducq @gkasprow here is a BD of the WR test setup we used on the KC705.

wr kc705 block diagram

@WeiDaZhang correct me if I'm wrong, but the configuration was:

Sources of idiosyncrasy in this setup:

  1. @WeiDaZhang used a shared I2C bus for both the helper and main DCXOs. This made the code a little messy and prevented us from updating the DCXOs as fast as we would have liked. In the Sinara implementation, we will have independent buses for each DCXO, simplifying things and allowing us to update the DCXOs more frequently.
  2. As @WeiDaZhang didn't have a 125MHz wenzel, he used the transceiver PLLs to generate 125MHz from the 100MHz reference. This step won't be used in the misoc design
hartytp commented 5 years ago

Not shown in the above BD are a few baluns and fanout buffers.

Message @WeiDaZhang for a copy of his Vivado design.

hartytp commented 5 years ago

A few comments about the TI oscillators (@WeiDaZhang please correct any mistakes):

WeiDaZhang commented 5 years ago
hartytp commented 5 years ago

Thanks Weida

hartytp commented 5 years ago

@WeiDaZhang that plot is really nice. Could you stick the Si549 noise on as well for clarity, then it really gives one all the info about the different chips.

One other comment about this: in these calculations we assume the DCXO is updated once per loop iteration, so around 4kHz. The DCXOs support update rates of twice this, so it's possible to update twice per loop iteration to get an effective 1bit increase in tuning resolution and a corresponding 6dB reduction in the 1/f2 noise. This is something we'll probably do in the final version, but we haven't implemented it in our tests.

hartytp commented 5 years ago

@WeiDaZhang: @sbourdeauducq is now using DDMTD in ARTIQ for Sayma SYSREF phase calibration. He's observed that the DDMTD jitter is significantly worse when the FPGA is heavily loaded.

IIRC you also spent quite a while testing DDMTD noise/stability on the KC705 under varying FPGA loads, but didn't see a significant effect. Can you remind me what measurements you made on this?

Also relevant here is @sbourdeauducq's comment

The DDMTD stability [I assume this means jitter, not long-term stability] is better when the core is not clocked from the GTH pins but from the GC clock input. Unfortunately, the GTH and GC clocks are driven by different outputs of the Si5324, and they are not synchronized, so using the GC clock makes the DDMTD clock work better but then it is not synchronized to RTIO anymore :(

@sbourdeauducq to check I understand what you mean by this:

@WeiDaZhang how does that compare to your observations on the KC705/design for WR in Sinara?

hartytp commented 5 years ago

Also, @sbourdeauducq am I right in thinking that the deglitcher you've implemented is more of a debouncer

class DDMTDEdgeDetector(Module):
    def __init__(self, i):
        self.rising = Signal()

        history = Signal(4)
        deglitched = Signal()
        self.sync.helper += history.eq(Cat(history[1:], i))
        self.comb += deglitched.eq(i | history[0] | history[1] | history[2] | history[3])

        deglitched_r = Signal()
        self.sync.helper += [
            deglitched_r.eq(deglitched),
            self.rising.eq(deglitched & ~deglitched_r)
        ]

IIRC the Cern papers discuss a few more sophisticated ways of doing this that reduce jitter further (@WeiDaZhang is using one of the ways discussed in that paper)

hartytp commented 5 years ago

NB not trying to suggest you should do anything different here, just trying to keep track of the different work being done on related projects and pool info in one place.

sbourdeauducq commented 5 years ago

@sbourdeauducq to check I understand what you mean by this:

That's correct, except that it's not relevant if the HMC7043 output goes to a GC pin on the FPGA (it's connected to a DFF input).

The "jitter" is the peak-peak jitter measured by the first test of the firmware that operates on the raw (before averaging) results from the DDMTD core.

sbourdeauducq commented 5 years ago

Also, @sbourdeauducq am I right in thinking that the deglitcher you've implemented is more of a debouncer

That's the first deglitcher from the CERN paper, "First Edge selects the first positive edge as a good edge for the time differences counter".

WeiDaZhang commented 5 years ago

@hartytp @sbourdeauducq The attached is the stability test diagram and result. stability test block diagram 20190130 measured modified allan deviation between 2 ddmtds w_ mgtrefclk input or normal ios or fpga loaded and on reference channel instrument

WeiDaZhang commented 5 years ago

We tried:

I'm afraid I have not found what did we conclude by the time through searching in the email chain. @hartytp But looking at the curves, and as what you said, the difference seems <= 2 x jitter which we see in other cases.

jordens commented 5 years ago

Was that loaded by static logic or was the logic switching and doing something?

WeiDaZhang commented 5 years ago

LFSRs were feeding DSP48E slices or distributed slices to do multiplications, outputs are XORed to feed LED. https://github.com/WeiDaZhang/FPGA_LOAD