sinara-hw / sinara

Sayma AMC/RTM issue tracker
Other
42 stars 7 forks source link

DRTIO clock recovery/Si5324 in Sinara #515

Closed hartytp closed 6 years ago

hartytp commented 6 years ago

@cjbe @WeiDaZhang and I have been looking at the timing stability of DRTIO on Kasli (all comments apply equally to Sayma of course).

Expectation: E1. The latency between TTLs on Kasli DRTIO slaves and TTLs on the master should be deterministic and fixed between power cycles. Equivalently, the phase relationship between the master clock, measured at the SMA input), and the recovered RTIO clock, measured at the MMCX on the slave, should be fixed between power cycles. E2. The latency between TTLs on Kasli DRTIO slaves and TTLs on the master (or, equivalenty, the phase relationship between the two clocks) should be stable to <<1ns over standard operating temperature range (say, something like 10C-20C). E3. DRTIO is a basic piece of ARTIQ infrastructure, and the better it is (within reason) the more uses it is likely to find. Having a high-quality way of distributing time (e.g. clocks for data converters/TDCs etc) over fibres with good stability is a high-value feature for ARTIQ, so we should try to make the performance "as good as reasonably possible" (without overcomplicating our designs). i.e. let's not shoot for something that just barely meets our most lax specification if there is a way of doing significantly better without too much additional cost/complexity.

Observations: O1. In @cjbe's measurements, the latency was not constant between power cycles. As @jordens has independently pointed out, this appears to be expected for the Si523, data sheet says Input to output phase skew after an ICAL is not controlled and can assume any value. O2. The phase of clock output on the slave is extremely unstable over temperature. Touching the Xtal makes the slave clock move w.r.t. the master by multiple cycles of the 150MHz clock. The phase also wanders around noticeably even when Kasli is left alone in standard lab conditions and running at constant duty cycle etc.

Discussion: AFAICT, DRTIO uses the Si5324 as follows:

Comments: C1. Since we are very sensitive to the performance of the Xtal/XO on Kasli, we should be using a high-quality one. Kasli currently has a very cheap Xtal. We replaced this with a high-quality Crystek 125MHz XO (different package, dead-bugged on) and found that the phase stability of the Si5324 clock was significantly improved -- although the phase fluctuations of the 150MHz clocks were still large and easily notable by eye on a scope. C2. Running the PLL at such a low frequency is unlikely to be a good idea from a noise/stability perspective. Using the 125MHz cyrstal, we were able to run the PFD at a higher frequency (around 2MHz IIRC). This made a significant improvement in the phase stability although, again, not enough to solve the problems. We should aim to run the PFD in our "jitter cleaner" loop at the highest frequency possible to minimize noise/drift. C3. The Si5324 recommends using a reference which is not related by a rational number to our reference/output clocks. AFAICT, that's to minimize in-band-spurs. AFAICT, that recommendation is not good advice for us, since the problems introduced by running the PFD at a really low frequency are much worse than a few spurs (which are generally rejected by subsequent PLLs anyway). C4. As @sbourdeauducq has pointed out, it may be possible to fix the phase non-determinism of the Si5324 by wrapping it in an additional PLL using the FPGA (although, that somewhat defeats the purpose of having the Si5324 in the first place).

Recommendations for DRTIO v2.0: R1. Replace the Si5324 with a proper PLL! This could either be an analog PLL or a digital one like WR. The optimal choice will depend a bit on the measured noise and stability of the CDR clock from the FPGA. We'll aim to measure that soon. R2. Use a TCVCXO for the flywheel oscillator. The best TC(VC)XOs are generally at around 10MHz-25MHz, so we should probably follow WR and use a decent 25MHz TCVCXO for the flywheel oscillator. R3. Run the PLL PFD directly at 25MHz rather than some low frequency R4. Use a reasonably high-order loop filter (we used 3rd order for exactly this reason on the clock mezzanine that Weida and I designed). R5. If the TCVCXO is well-enough specified, we may be able to do our "hitless switching" by just setting the control voltage to mid-range during link initialisation (someone would need to check the numbers). That simplifies things a lot.

hartytp commented 6 years ago

@dtcallcock I remember Jeff saying he'd measured timing stability for WR. Would you mind asking him for any data he has on this, please?

hartytp commented 6 years ago

PS sorry for the essay, wanted to get my thoughts clear while this is fresh in my mind!

sbourdeauducq commented 6 years ago

As @sbourdeauducq has pointed out, it may be possible to fix the phase non-determinism of the Si5324 by wrapping it in an additional PLL using the FPGA (although, that somewhat defeats the purpose of having the Si5324 in the first place).

Not necessarily: the FPGA PLL output would only drive the Si5324 input, and you still get the jitter filtered (the clock that is used for the rest of the design is the Si5324 output only).

I see three ways of doing this:

I'm surprised we didn't see the Si5324 problems before, when we tested this: https://github.com/m-labs/drtio_transceiver_test

sbourdeauducq commented 6 years ago

The phase of clock output on the slave is extremely unstable over temperature. Touching the Xtal makes the slave clock move w.r.t. the master by multiple cycles of the 150MHz clock. The phase also wanders around noticeably even when Kasli is left alone in standard lab conditions and running at constant duty cycle

I wonder if this is normal. Wouldn't that also cause problems in typical situations where the Si5324 is used with a shallow elastic buffer?

This is also quite difficult or impossible to fix with the FPGA.

hartytp commented 6 years ago

Unsurprisingly, WR is really nicely designed and very carefully thought through. They have lots of good application notes on this, which contain a lot of the info we need (GTX noise measurements etc). I started pooling some of the data on the Wiki https://github.com/m-labs/sinara/wiki/SinaraClocking

The good news is that they're able to recover a really pretty good clock over 1 3km fibre, even after FPGA transceivers. -105dBc/Hz at 10Hz (for reference the SRS rubidium clock is -105dBm/Hz at 1Hz, -135dBc/Hz at 10Hz).

hartytp commented 6 years ago

So, it all comes down to designing a good PLL with a good flywheel oscillator. That's not too hard.

hartytp commented 6 years ago

detect the skew with the FPGA (e.g. similar procedure as DDR3 write leveling) and correct using a MMCM with a fine phase shift inserted between the recovered clock and the Si5324 input. Deterministic skew performance is limited by the resolution of the phase shift (1/56 of VCO frequency, the VCO having a maximum of 1200MHz, so >15ps).

I suspect that the stability/noise of this approach will not be great. But, could work as a bit of a hack to get the current HW up and running while we design a better long-term solution.

sbourdeauducq commented 6 years ago

To get the current hardware running, perhaps dropping in a Si5326 will work. Double-check the datasheet, but it seems: it is pin-compatible, it has mostly the same register interface, it has input-to-output skew control, and its features generally are a superset of the 5324's.

gkasprow commented 6 years ago

there is one possible fix - add guard shield around the oscillator. It is not written in Si5324 datasheet, but is recommended for some other Silabs chips with similar oscillator. It is simple - dedicated PCB polygon around the crystal, dedicated copper area below it, connected to the GND in one place, close to SIlabs XI/XO and nowhere else.

hartytp commented 6 years ago

@gkasprow what does that do? Just an EMI shield? These problems are mainly thermal. (A screening can to prevent air currents reaching the crystal would probably help a bit however.

dtcallcock commented 6 years ago

Here's Jeff Sherman's poster on some measurements he did at NIST on the SevenSols WR-LEN. sherman_WSTS2017_poster.pdf

He got 100fs because he's got mad skillz.

gkasprow commented 6 years ago

@hartytp Yes, I meant EMI shield.

hartytp commented 6 years ago

Thanks David and Jeff! Mad skills indeed.

So, Jeff thought that the SevenSols WR-LEN might be implemented in a slightly different way to the original WR and might be somewhat better. Anyone know any details about its implementation? AFAICT it's not open source, or am I missing something?

hartytp commented 6 years ago

There was a mistake in the measurements reported above. @cjbe retook the data more carefully and found:

Once we receive more Kasli, I'll repeat this measurement using an interferometer (mixer) to give a sub ps measurement noise floor.

hartytp commented 6 years ago

So, it seems that the current hardware can provide <<ns timing stability with correct Si5324 settings.

Note that the phase of the Si5324 is still random at startup. However, this should be fixable by using a PLL in the FPGA to add a phase shift to the Si5324 input clock (this may degrade the phase stability a little however). The FPGA can then be used to measure the Si5324 startup phase and tune the Si5324 input phase shift appropriately. So, we should be able to get a decent solution with current hardware and only gateware/firmware changes.

hartytp commented 6 years ago

Pending the result of the interferometer measurement to look at the drift of the Si5324 phase, I'd still like to suggest that we should consider using a solution based on a proper low-noise PLL IC with a better reference oscillator (VCO).

My thinking here is that if we can get ps-level timing stability for the recovered clock -- and Jeff's data suggests that we should be able to -- then there are a lot of cool things we can do with it.

If ps timing stability can be achieved with the Si5324 + Xtal then great, otherwise, I'd like to prototype a better solution with the intention of applying it to Kasli/Sayma/Metlino once it's been well tested using current hardware.

hartytp commented 6 years ago

To do the design, the one thing we need to finalize is the choice of RTIO frequencie(s).

The RTIO frequency affects our PLL design:

Any objections to fixing the RTIO frequency to 125MHz if that makes the PLL design easier/better (obviously, other frequencies available as population options, potentially with somewhat worse performance)? @jordens

jordens commented 6 years ago

The reasons to choose 150 MHz f_RTIO and my take on the arguments you bring up are described elsewhere. I don't think rehashing them is necessary as they haven't changed. There would probably be quite some development needed to go to 1 GHz SAWG now but that is also a chance to revise the parametrization and datapath design and rethink the specs. The current and potential SAWG users would need to think hard and weigh in.

dtcallcock commented 6 years ago

So, Jeff thought that the SevenSols WR-LEN might be implemented in a slightly different way to the original WR and might be somewhat better. Anyone know any details about its implementation? AFAICT it's not open source, or am I missing something?

I opened one up and the important chips seem to be:

Artix-7 XC7A35T Analog Devices AD9516-4BCPZ Micrel KSZ9031RNXCA SiLabs Si570 BBC000121G VCXO 25.000 SRe647A (unknown vendor)

Jeff claims it's a development of what was at some point an open source project. I will ask him for more details.

jordens commented 6 years ago

https://bitbucket.org/account/user/sevensols/projects/LEN

gkasprow commented 6 years ago

@dtcallcock The WR-LEN is not open source hardware but they keep the gateware open since it is based on original WR development. However the chipset is similar as SPEC WR node I developed for CERN a few years ago. The main difference is that they use ZynQ SoC instead of original Spartan FPGA and they use Silabs I2C tuned crystal oscillator instead of original DAC + VCXO + PLL with VCO. However the DMTD still uses same 25MHz helper VCXO oscillator. The PN is LF VCXO026156. Original design used 25MHz VCXO + external x5 PLL because the price and availability of the 125MHz VCXO was a barrier. Later on 62.5 MHz instead of 125MHz was used to clock the WR core. Later on in order to improve the jitter of the WR node, which was dominated by the jitter of the CDCM61004, Silabs Si570 chip was used instead of the VCXO + PLL. This required significant modification of the WR node gateware because I2C controller had to be included into the loop. So probably 7Sols went the same way and in this way improved the jitter performance. @twlostow knows more details about this development.

gkasprow commented 6 years ago

Edit: they say that use VCXO and TCXO controlled by DAC, so there should be yet another oscillator. Silabs is probably used for general purposes clocking

gkasprow commented 6 years ago

@dtcallcock in what bandwidth Sherman got his 100fs?

dtcallcock commented 6 years ago

@gkasprow From the poster is seems he was comparing the master input 10MHz with the slave output 10MHz with a 50Hz effective BW and averaging down for 10^4s to get to 100fs. This was in a temperature and humidity controlled chamber. If you need more specifics I can ask him or just give you his email.

gkasprow commented 6 years ago

For WR-ZEN design they use the same VCXO as we use for Urukul (CVHD-950) + low jitter divider (some chip from NS) to get 10MHz. Then they apply intensive filtering using 2x MCL to get rid of higher harmonics. So they do probably the same for WR-LEN. So the main difference to the original SPEC design is better VCXO with lower phase noise, lack of the PLL and VCO. Control circuit is the same since I see two DACs + reference used in SPEC design

hartytp commented 6 years ago

The reasons to choose 150 MHz f_RTIO and my take on the arguments you bring up are described elsewhere. I don't think rehashing them is necessary as they haven't changed. There would probably be quite some development needed to go to 1 GHz SAWG now but that is also a chance to revise the parametrization and datapath design and rethink the specs. The current and potential SAWG users would need to think hard and weigh in.

We're comparing two configurations:

@jordens I'm going to try to summarise the arguments here. Please correct me if I'm wrong:

  1. Carrier bandwidth: 1GHz DAC clock (125MHz RTIO clock) obviously gives a higher carrier bandwidth than 600MHz by nearly a factor of two. For single-tone operation, this directly determines the bandwidth of the signal one can produce.
  2. Baseband bandwidth: Ignoring a small correction due to the anti-aliasing filters, the baseband (IQ) bandwidth one can get out of the SAWG in two-tone operation is the +-f_RTIO. So, operating at 600MSPs gives one +-150MHz baseband bandwidth, compared with +-125MHz bandwidth for the 1GSPSscase. This bandwidth effectively determines the maximum separation between the tones (not the maximum RF frequency) in two-tone mode.
  3. Compile-time: the compile time is a super-linear function of the CORDIC parallelisation factor, so the 1GSPs case will be quite a bit slower to compile than the 600MSPs case. This will have a nock on effect on the complexity/cost of development for the two cases.
  4. Yak shaving: in principle switching from 600MSPs to 1GSPs just needs a few settings changed. In practice, it could take some development time (and further funding?).

IIRC, we've agreed to start with the 600MSPs case as the quickest way to get going. However, I'm keen to switch to the 1GSPS operation in the long run, as this maps better to my use-cases (and, AFAICT, to most ion trap use cases).

So, question for other potential Sayma users (@jbqubit @dhslichter @dtcallcock @cjbe) how do you plan to use Sayma? Do you want/need the 600MSPs (150MHz RTIO) use-case? Or, like me, would you prefer the full 1GSPs bandwidth of the DACs?

dtcallcock commented 6 years ago

@hartytp

1GSPs DAC data rate (500 MHZ nyquist) wit the DAC clock up to 2GHz, RTIO frequency at 150Hz (x4 parallelisation in the CORDICS to generate 1GSPS Data).

Should that be '...125MHz (x8 parallelisation...'?

hartytp commented 6 years ago

Apologies, yes. Edited. Thanks!

hartytp commented 6 years ago

@jordens FWIW, if there is anything we can do to aid with SAWG/Sayma timing closure/compile time then let me know. e.g. would it help to scale back the moninj support/the number of RTIO channels?

sbourdeauducq commented 6 years ago

Gateware/firmware Si5324 fix applied, and from cursory testing it appears to work (even on Sayma).

hartytp commented 6 years ago

Do any potential users of Sayma (@cjbe @dhslichter @dtcallcock @jbqubit @klickverbot) have any feedback on the proposal to switch Sayma to 1GSPS operation? Is this helpful for you? Unhelpful? Or, you don't care?

@jordens ping: is there anything we can do to get the resource usage/compile time down for Sayma to make it easier to boost the data rate? Would scaling back moninj help? Do you think there are simple tweaks to the code that would help?

jordens commented 6 years ago

Rethinking and redesigning the parametrization and the design. There isn't even moninj for SAWG (https://github.com/m-labs/artiq/issues/675) or the two additional ways of modulating the datapath (https://github.com/m-labs/artiq/issues/801 and the ePID modulation ports) yet. Limiting those features and e.g. reducing the range of DUC frequencies (the "carrier frequency") to n*125 MHz could be extremely beneficial. It would be a nice design study.

hartytp commented 6 years ago

@jordens thanks for that.

Well, I think that to make progress on this issue, we need to hear from the users about what their requirements actually are.

I definitely agree that limiting the SAWG to only do the things people need, rather than trying to create a single bitstream that can scratch every itch one can imagine, is a good idea.

Edit: FWIW, I could definitely live with a simpler parameterization of the SAWG if that would help.

hartytp commented 6 years ago

Limiting those features and e.g. reducing the range of DUC frequencies (the "carrier frequency") to n*125 MHz could be extremely beneficial

AFAICT, that would be fine for all the use cases I can think of. The upconversion effectively gives one access to n*f_RTIO +-f_rtio (less a bit from filters), so the different carrier frequency ranges would overlap nicely. If doing that and simplifying the parameterization would make a big impact and allow higher carrier frequencies, I'd definitely be for it.

AFAICT, moninj for the SAWG is not particularly useful. But, AM is a must for noise eating.

jordens commented 6 years ago

Even up-conversion to n*k/m*f_RTIO (k=8=f_sample/f_RTIO here) with m=16 and n \in {0,...,m-1} could turn out to be as simple as m=8.

hartytp commented 6 years ago

@jordens good to know. That would be absolutely fine for me then. I struggle to see any cases where it wouldn't be okay.

jordens commented 6 years ago

It would be a couple of weeks probably and it would likely make a major difference.

hartytp commented 6 years ago

ack. Thanks

dhslichter commented 6 years ago

@hartytp @jordens I tend to lean towards the 125 MHz RTIO frequency and 1 GSPS Sayma data path, for the various reasons that have been stated above by @hartytp (carrier bandwidth in particular). Changing the analog bandwidth of a digitally upconverted signal from 150 MHz to 125 MHz doesn't break anything important for us AFAICT. Using the simple n*125 MHz DUC carrier frequencies ought to work for us as well, though if the slightly more complex carrier frequencies suggested by @jordens are just as easy to implement I would vote we go that way.

hartytp commented 6 years ago

Thanks for the feedback @dhslichter!

Well, I can't speak for every possible user of Sayma, but for all the ion trap use cases I can think of 1GSPS/125MHz is the right way to go. So, unless I hear from someone who wants/needs something else, I'm going to design around the 125MHz case.

My plan is the following:

dhslichter commented 6 years ago

@hartytp are you talking about a cheaper/simpler version of the clock mezzanine, effectively, as the long-term solution for Sayma/Kasli?

hartytp commented 6 years ago

You could call it that I guess. But not a mezzanine (the connectors are too bulky and expensive). I just mean adding a pll + 125Mhx xo to do high quality clock recovery.

hartytp commented 6 years ago

Aim is to keep it cheap and simple but to still get ps stability on the drtio clock, as once you have that you can do a lot with it.

dhslichter commented 6 years ago

Design a small PLL board that can be added to Kasli to provide a ps-stable timebase over DRTIO For future revisions of Sayma/Kasli, add the new 125MHz PLL, but keep the Si5324 (DNPd?) as a fallback.

I guess I interpreted these as meaning a mezzanine -- I think putting it on the main board is preferable. So are you saying by "add the new 125 MHz PLL" that we would have a PLL on the main board, which we populate differently depending on 125 MHz vs 150 MHz? That would sound good to me. Sorry for the confusion.

hartytp commented 6 years ago

Aah sorry. I meant that we could patch existing Kasli with a separate board thats basically glued on. For the next version we'd move it to the main pcb once it's been tested.

hartytp commented 6 years ago

Quick write up of some of the measurements @WeiDaZhang has taken.

Si5324 using Wenzel 100MHz oscillator as reference.

si5324 eval board ckin1 reference with 100mhz wenzel output 241mhz or 244mhz with dac norm

Conclusion:

hartytp commented 6 years ago

Repeat of the measurements from the WR Low jitter project using a KC705.

KC705 (Kintex - 7) GTP CLK with Narrow Band PLL Simulation vs DDMTD Report.pdf

hartytp commented 6 years ago
hartytp commented 6 years ago

Simulating the required loop filter for an analog PLL:

So, LF needs to be digital. Options are:

  1. Implement WR low jitter

    • Requires: 125MHz XO; SPI DAC; a separate PLL to generate the tone that we beat against the CDR clock.
    • Can basically copy WR with the improvements from the low-jitter project.
    • Seems well tested (AFAICT this is what the WR-LEN boxes are, and they seem to work extremely well)
    • AFAICT, this will hit the transceiver noise floor, so should not add significant noise to the CDR clock
  2. Go for SDR approach:

    • Requires: fast 2-channel ADC to sample XO and CDR clock; DAC; XO.
    • Probably more expensive and more power hungry than other approaches. Needs more FPGA resources.
  3. Baseband ADC:

    • Analog PFD, digitize with ADC. LF in FPGA and then control XO with DAC.
    • Noise is potentially a serious concern. Needs careful design.

From the above, option 1 (implement WR) seems to be the best in terms of cost/complexity and risk (we're copying WR so a lot of the design work and testing has been done for us).

gkasprow commented 6 years ago

I have a few AMC boards with WR oscillators, Kintex and Artix FPGAs (AFC and AFCK boards). If somebody wants to play with them, I can lend them for a few weeks.