sinara-hw / sinara

Sayma AMC/RTM issue tracker
Other
42 stars 7 forks source link

Sayma: add clocking path from AMC Si5324 output to RTM clock input #576

Closed sbourdeauducq closed 6 years ago

sbourdeauducq commented 6 years ago

Right now, using DRTIO with the DACs requires a coaxial cable between the AMC and RTM. This is a bit unwieldy.

hartytp commented 6 years ago

That sounds like a good idea. But, we should think carefully about where on the AMC <-> RTM connector this signal goes to keep it separated from any noise sources, such as unscrambled LVDS lines.

Given that the reference clock will generally have 1/f**3 close-in phase noise, the narrower our lock bandwidth, the less of an issue this is (although, not sure if LVDS clocks operating at exactly the rtio frequency will still cause nasty DC phase offsets). So, it will pay to have a good clock recovery circuit on the RTM (narrower LF than the 100kHz on the 830).

@gkasprow do you know of any cross-talk measurements for these connectors? Do you have a test setup where you could measure it?

gkasprow commented 6 years ago

@hartytp I have S-models for them. I tried crosstalk simulation with ANSYS tools but for some reason I'm not able to use some S-models. They work fine with Hyperlynx but it does only time-domain simulations. I found some measurements for these connectors (Tyco HM-Zd) Crosstalk plot: obraz Here is more complete data

hartytp commented 6 years ago

Thanks Greg!

Here is more complete data

I didn't see any S-parameters there, only time-domain plots. But the graph you posted above is good.

So, assuming our signals have rising edges to 1GHz or beyond, the cross-talk for diagonal neighbours is something like -50dB, which is quite bad. It's probably quite a bit better as the pairs get further separated, but without more data, we should assume that the cross-talk in this connector is significant so this won't be a particularly high-quality clock.

@sbourdeauducq why use this instead of just recovering the clock on the RTM once we have DRTIO up and running? By the time we've been through the AMC Si5324, we have already given up on phase stability to better than 100ps, so this can't be used for most experiments anyway (unless we move to a DDMTD-based solution using a VCXO/DCXO).

Given that RTM<->AMC connector lines are quite a scarce resource, I'm not sure that this is a good use of them.

sbourdeauducq commented 6 years ago

@sbourdeauducq why use this instead of just recovering the clock on the RTM once we have DRTIO up and running?

Because Sayma is enough of a house of cards and a PITA to work on already.

Given that RTM<->AMC connector lines are quite a scarce resource, I'm not sure that this is a good use of them.

Consider killing one of the GT_CLKx instead? Development time is also a scarce resource.

Anyway, if this path isn't useful, then we should not have it, and we should remove the Si5324 to HMC830 connection/mux on the RTM side as well. If we have an external AMC/RTM connection with a coax it is also possible to insert an external PLL there to reduce jitter.

hartytp commented 6 years ago

Consider killing one of the GT_CLKx instead?

If that's not needed then let's definitely kill it.

Development time is also a scarce resource.

Agreed. But, I don't think that having to run a short piece of coax between the AMC and RTM costs us any development time.

In addition to the extra AMC<->RTM BP line this new clock path would require, we'd also have to add an extra mux on the RTM to route this to the clock network. One of my ambitions for Sayma v2.0 is to remove as many clock muxes from the RTM as possible.

IMHO, this new RTM clock source wouldn't be sufficiently useful to justify the added complexity, due to connector cross-talk and Si5234 phase drifts, so I'd argue that it's not worth it.

and we should remove the Si5324 to HMC830 connection/mux on the RTM side as well.

Yes, I think that we should remove it, as we now know that it's not actually that useful. Ideally, in some future version of Sayma we'll have a proper DDTMD-based PLL to replace the Si5324 and provide a high-quality CDR clock. But, until that lands, I don't see much value in routing the Si5324 to the RTM clock network.

sbourdeauducq commented 6 years ago

IMHO, this new RTM clock source wouldn't be sufficiently useful to justify the added complexity, due to connector cross-talk and Si5234 phase drifts, so I'd argue that it's not worth it.

Fair enough. But once we have a better PLL than the Si5324, how will the AMC clock the RTM?

hartytp commented 6 years ago

Fair enough. But once we have a better PLL than the Si5324, how will the AMC clock the RTM?

I was assuming that we'd provide two options:

  1. The RTM is clocked from an external low-noise oscillator via the front panel SMA
  2. The RTM is clocked from the recovered DRTIO clock

(2) is a longer-term approach, which I don't expect to be an option any time soon. Correct me if I'm wrong here, but we do need to get DRTIO working robustly on the RTM eventually, right? If we do that, AFAICT, using that clock to clock the HMC830 shouldn't be any extra work, right? Or, is there something I'm missing?

sbourdeauducq commented 6 years ago

Correct me if I'm wrong here, but we do need to get DRTIO working robustly on the RTM eventually, right?

Is there anything that ever works robustly on all Sayma boards? It is much better to isolate faults when possible, instead of creating long and fragile dependency chains. That way, a board with some partial failure can still be used for something.

It is extra work; we have to think about the dependencies between clocks/transceivers and how to make multipliers-followed-by-dividers have fixed latency. By the way - what is the plan to compensate for the non-deterministic skew of the 830+7043? I suspect that a good amount of the skew variation seen from siphaser is due to the FPGA. Unless we improve the phase comparison gateware, there will be the same problem when syncing the 830+7043, even if the 830 is clocked by a lower-noise, deterministic-skew PLL.

sbourdeauducq commented 6 years ago

Still on the topic of that PLL and somewhat off-topic here: We may want to clock the high-quality PLL without leaving the FPGA transceiver, i.e. using OBUFDS_GTE3.

hartytp commented 6 years ago

Is there anything that ever works robustly on all Sayma boards?

Sure, but eventually it will all have to work robustly or it's no good. I'd put off worrying about fancy CDR PLLs until everything's been stable for a while.

In the mean-time, running a coax cable to the RTM is fine IMHO. If you want an AMC<->RTM connector clock line as well, I'm fine with that, I'm just not sure how much use it will be due to cross-talk in that connector.

By the way - what is the plan to compensate for the non-deterministic skew of the 830+7043?

The HMC830 input-output phase (I assume that's what you mean by "skew") is fixed so long as the output clock is an integer multiple of the input clock, so I don't think there is any issue there. The 7043 latency needs to be measured by the FPGA and corrected. But, that can be done exactly.

I suspect that a good amount of the skew variation seen from siphaser is due to the FPGA.

Maybe, but I haven't seen data to suggest that.

@cjbe's drift measurements were mainly taken without the FPGA IIRC (just looking at the Si5324 locked to an external reference). The data Jeff took shows that WR can achieve 100fs stability in an environment with decent thermal control. It won't be that good on Sayma due to the dynamic heat loads, but I don't expect it to be anything like the 100ps we see on Kasli si-phaser.

hartytp commented 6 years ago

Unless we improve the phase comparison gateware, there will be the same problem when syncing the 830+7043, even if the 830 is clocked by a lower-noise, deterministic-skew PLL.

Can you expand on this? The non-determinism for the HMC7043 is an integer number of cycles of the HF clock, this is something we can compensate for exactly, so it seems different to the Si-phaser issues. What am I missing?

hartytp commented 6 years ago

Still on the topic of that PLL and somewhat off-topic here: We may want to clock the high-quality PLL without leaving the FPGA transceiver, i.e. using OBUFDS_GTE3.

Yes, I think that's what Weida is doing. Let's discuss this in the other issue once he posts his results.

sbourdeauducq commented 6 years ago

Taking advantage of the quantization of the skew variation is one possible gateware(/firmware) improvement (which can also be applied to the Si5324 AFAICT). There is no special issue with it, it's just a bit of extra complexity (and there are details such as resolving ambiguities e.g. by making sure that setup/hold constraints are respected) and someone has to do the work.

gkasprow commented 6 years ago

We can improve clock quality by adding CMCS on both side of line. It is differential line so real crosstalk should be far better than specified.

hartytp commented 6 years ago

There is no special issue with it, it's just a bit of extra complexity (and there are details such as resolving ambiguities e.g. by making sure that setup/hold constraints are respected) and someone has to do the work.

Doesn't the current code do that already for the HMC7043? Since we don't change the analog delay between power cycles, only the digital delay, the input-output phase relationship should be exactly the same each boot. Or, am I missing something?

Does the Si5324 provide an equivalent "digital" delay that could be used for this purpose?

Edit: in any case, the phase stability of the Si5324 is so poor that there isn't any point worrying about quantizing the delay -- the 100ps drifts are already large compared with the 25ps HMC7043 analog phase resolution.

hartytp commented 6 years ago

We can improve clock quality by adding CMCS on both side of line. It is differential line so real crosstalk should be far better than specified.

True, but it still seems unlikely to ever be as good as we need.

In any case, as I said, if still @sbourdeauducq feels this would be really useful -- and worth the extra mux/AMC<->RTM line -- then I don't object (after all, maybe some users really don't care much about noise). But, it does go against the aim of trying to minimize the number of clocking options/complexity of the clocking tree for the next revision.

sbourdeauducq commented 6 years ago

Since we don't change the analog delay between power cycles, only the digital delay, the input-output phase relationship should be exactly the same each boot. Or, am I missing something?

You are taking a 150MHz recovered DRTIO clock A, putting it through a user-supplied coaxial cable, multiplying to 1.2GHz with the HMC830 and then dividing it back to 150MHz to produce clock B. A and B have a random phase relationship across coax cable changes and board reboots. Right now those two problems are solved by putting the 830+7043 into the siphaser loop.

sbourdeauducq commented 6 years ago

the 100ps drifts are already large compared with the 25ps HMC7043 analog phase resolution.

Where are those drifts coming from? IIRC this was across board reboots, and they might as well come from siphaser or the FPGA, not the Si5324 itself.

hartytp commented 6 years ago

Where are those drifts coming from? IIRC this was across board reboots, and they might as well come from siphaser or the FPGA, not the Si5324 itself.

No, this was over time/temperature using the Si5324 as a buffer. Most likely, it's thermal issues with the XO causing phase drifts. A higher-order loop filter in the Si5324 or a better XO might have fixed it. @cjbe should correct me if I'm wrong, but I don't think that test in any way involved the FPGA.

hartytp commented 6 years ago

You are taking a 150MHz recovered DRTIO clock A, putting it through a user-supplied coaxial cable, multiplying to 1.2GHz with the HMC830 and then dividing it back to 150MHz to produce clock B. A and B have a random phase relationship across coax cable changes and board reboots. Right now those two problems are solved by putting the 830+7043 into the siphaser loop.

OK, I hadn't looked at that part of the code. Obviously, if the coax changes then the analog delay needs to change. I had assumed that we would do the analog delay scan once to measure the coax length. Then fix the analog delay (giving a warning if S/H is not met). After that, it's only the HMC7043 dividers that need synchronizing which is a quantized digital delay, so should be exactly reproducible.

hartytp commented 6 years ago

@sbourdeauducq Given the way you're doing the synchronisation, what level of phase shifts from boot to boot do you expect? Is it likely to be good enough?

e.g. for 400MHz RF, 25ps corresponds to 3.6deg, which would be a big boot-boot phase variation (I would personally not call that synchronized).

hartytp commented 6 years ago

I have to say, the more I think about this, the more I think we should just switch to using a 7044.

Then we could synchronise everything by supplying a single synchronisation pulse that's sampled from a 100MHz clock. No messing about with cycle slips, analog delays, or any of that crap. I get the impression that one of the reasons we're struggling so much atm is that we're not using these chips the way they were intended to be used (you won't see what we're doing in any app note from HMC/ADI). But, supplying a sync pulse to the HMC7044 is a very well supported route and should work.

Would anyone be willing to consider just giving up on synchronisation for Sayma v1.0 and moving to a 7044 in v2.0?

sbourdeauducq commented 6 years ago

"Best effort considering the hardware". Should be comparable to Kasli; I don't see how it can be improved other than by connecting an external clock source to the HMC830.

hartytp commented 6 years ago

"Best effort considering the hardware". Should be comparable to Kasli; I don't see how it can be improved other than by connecting an external clock source to the HMC830.

Sure, but it is important to flag to users that the process you're developing can never synchronise the Sayma RF to the degree-level. Therefore, it's not actually any more use for most of our experiments than completely unsynchronised RF.

sbourdeauducq commented 6 years ago

switch to using a 7044.

Please consider that this option requires hiring someone with extreme patience and a salary unheard of in academia, as HMC chips are brimming with bugs and obnoxious behavior.

hartytp commented 6 years ago

To be clear about the HMC7044 point:

sbourdeauducq commented 6 years ago

Yeah, HMC chips look good on paper.

hartytp commented 6 years ago

Please consider that this option requires hiring someone with extreme patience and a salary unheard of in academia, as HMC chips are brimming with bugs and obnoxious behavior.

If we can agree on a test configuration, I'll happy to buy an eval board and get it running... My bet is that it will take me less time to do that than to help you debug the current code. And, in any case, I don't think the current code is very useful if it doesn't give sub-degree syncrhonisation, as I'll have to calibrate all phases to my ion on each boot. If I'm doing that, it doesn't really matter if the starting phases are a few deg off or completely random.

hartytp commented 6 years ago

Yeah, HMC chips look good on paper.

As I said, the advantage of the HMC7044 if used in this way is that we're using hard-coded logic in exactly the way described in all the app notes. So, we're not poking around in the register interface, trying to do things that the designers didn't really test.

sbourdeauducq commented 6 years ago

I'll happy to buy an eval board and get it running

This is not enough, you'll have to get it running on the actual Sayma hardware.

hartytp commented 6 years ago

hmmm..let me think about a way to do that. On the plus side, what do we have to loose? AFAICT, the current synchronisation is basically broken by design, so even if the 7044 doesn't work, we not much worse off...

hartytp commented 6 years ago

anyway, let's move this to a new issue.

sbourdeauducq commented 6 years ago

If clocking from the AMC is a niche use case, then the coax cable is acceptable. Not worth adding another mux.