sinara-hw / Urukul

4 channel 1GS/s DDS (AD9910 or AD9912 variant)
14 stars 7 forks source link

SYNC_IN jitter #16

Closed jordens closed 4 years ago

jordens commented 5 years ago

The jitter on the SYNC_IN signal from Kasli to the AD9910 (throught the LVDS buffers and the fanout) is very high in some caes (the tester setup connected to the buildbot).

At validation delay 1 (hold and setup margin 1 tap) the window is just 2 taps wide (a tap is about 75 ps). http://buildbot.m-labs.hk/builders/artiq/builds/2669/steps/python_unittest_2/logs/stdio

This is the SMP_ERR matrix on tester, rows are increasing validation delay, columns are SYNC_IN delay on the AD9910.:

[1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

There also seems to be some bounce around the edges (top row).

On the systems I have here I get about 5-6 tap wide windows at validation delay 1. That's not stellar but OK. When using the SYNC signal on board from the first DDS, the window at validation delay 1 is 8 taps wide on tester, 8-9 taps here. Assuming equal tap delay for the validation delays and the SYNC_IN delays, the theoretically best case is validation delay 4 and a window width of ~4 or a validation delay of 1 and a window width of ~10 (i.e. SYNC_IN delay periodicity minus twice the validation delay).

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
[1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]
[1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In both cases Kasli/v1.1 and Urukul-AD9910/v1.3, connected via MMCX to Kasli-J1.

The jitter seems to come in part from Urukul and in part (the larger part) from Kasli. And it varies between setups. I changed a couple things (see the artiq changelog) to optimize jitter on the RTIO clock but there was little effect. From Vivado the max peak-peak jitter on the clock driving the SYNC output buffer in the FPGA is ~90ps. I tried running the SYNC fanout from the supposedly quieter P1V8A rail but that doesn't seem to work at all.

@gkasprow @marmeladapk could you have a look at the jitter on SYNC_IN (EEM0:7, before and after the sync fanout, compared to the Kasli MMCX clock)?

@cjbe @klickverbot When you were playing with SYNC, did you look at SMP_ERR? How did you select the SYNC_IN delay and the validation delay? Did you scan them?

c.f. m-labs/artiq#1143

jordens commented 5 years ago

I tried with the fully loaded Tester variant and Urukul on EEM0/1 on the hardware here. That also reproduces the narrow window (2 taps wide at validation delay 1) and the bouncing. The bouncing is something that I can't really explain with regular FPGA supply rail crowding.

Then I tried with a variant containing only that Urukul EEM and nothing else. Both with it on EEM0/1 and EEM5/4. Both are very jittery as well (window size ~4 at validation delay 1).

And I tried clocking the SYNC output register from its own dedicated BUFG. No significant change.

cjbe commented 5 years ago

@jordens we look at SMP_ERRs and scanned to find the optimum point, and see similar things to you (i.e. similarly narrowed window widths for the Kasli clock vs the DDS sync_out).

I am also a bit worried about this, but have not had time to look at it properly yet. Empirically it is working without problem across two systems in our lab.

When I tested this thoroughly (https://github.com/sinara-hw/Urukul/issues/3#issuecomment-396951759) I saw several SMP_ERRs per 10^10 samples (300ps validation window), but I did not see any losses of sync. (To see this I logged channels from 2 different Urukuls on a scope on persist, started the system from cold, and ran it for a day to achieve 10^10 resyncs).

dnadlinger commented 5 years ago

@jordens We set the SYNC_IN delay by scanning it and manually choosing the centre of the "eye" (i.e. error-free region). Two examples from a while ago, using two Urukul v1.0 connected to the same Kasli v1.0:

rurukul0: 0   [0, 0, 0, 0]
rurukul0: 1   [1000, 1000, 259, 0]
rurukul0: 2   [1000, 1000, 1000, 1000]
rurukul0: 3   [1000, 1000, 1000, 1000]
rurukul0: 4   [1000, 1000, 1000, 1000]
rurukul0: 5   [1000, 1000, 1000, 995]
rurukul0: 6   [1000, 1000, 1000, 1000]
rurukul0: 7   [1000, 1000, 1000, 1000]
rurukul0: 8   [0, 0, 0, 1000]
rurukul0: 9   [0, 0, 0, 0]
rurukul0: 10  [0, 0, 0, 0]
rurukul0: 11  [0, 0, 0, 0]
rurukul0: 12  [0, 0, 0, 0]
rurukul0: 13  [0, 0, 0, 0]
rurukul0: 14  [1000, 758, 499, 49]
rurukul0: 15  [1000, 1000, 1000, 1000]
rurukul0: 16  [1000, 1000, 1000, 1000]
rurukul0: 17  [1000, 1000, 1000, 1000]
rurukul0: 18  [1000, 1000, 1000, 1000]
rurukul0: 19  [1000, 1000, 1000, 1000]
rurukul0: 20  [1000, 1000, 1000, 1000]
rurukul0: 21  [1000, 1000, 870, 1000]
rurukul0: 22  [0, 0, 0, 0]
rurukul0: 23  [0, 0, 0, 0]
rurukul0: 24  [0, 0, 0, 0]
rurukul0: 25  [0, 0, 0, 0]
rurukul0: 26  [0, 0, 0, 0]
rurukul0: 27  [0, 0, 0, 2]
rurukul0: 28  [1000, 1000, 1000, 1000]
rurukul0: 29  [1000, 1000, 1000, 1000]
rurukul0: 30  [1000, 1000, 1000, 1000]
rurukul0: 31  [1000, 1000, 1000, 1000]

rurukul1: 0   [0, 0, 0, 0]
rurukul1: 1   [0, 0, 0, 0]
rurukul1: 2   [0, 0, 0, 0]
rurukul1: 3   [1000, 0, 156, 0]
rurukul1: 4   [1000, 1000, 1000, 1000]
rurukul1: 5   [1000, 1000, 1000, 1000]
rurukul1: 6   [1000, 1000, 1000, 937]
rurukul1: 7   [1000, 1000, 1000, 1000]
rurukul1: 8   [1000, 1000, 1000, 1000]
rurukul1: 9   [1000, 1000, 1000, 1000]
rurukul1: 10  [0, 61, 0, 0]
rurukul1: 11  [0, 0, 0, 0]
rurukul1: 12  [0, 0, 0, 0]
rurukul1: 13  [0, 0, 0, 0]
rurukul1: 14  [0, 0, 0, 0]
rurukul1: 15  [0, 0, 0, 0]
rurukul1: 16  [1000, 0, 1000, 0]
rurukul1: 17  [1000, 900, 1000, 1000]
rurukul1: 18  [1000, 1000, 1000, 1000]
rurukul1: 19  [77, 1000, 1000, 1000]
rurukul1: 20  [1000, 1000, 1000, 1000]
rurukul1: 21  [1000, 1000, 1000, 1000]
rurukul1: 22  [1000, 1000, 1000, 1000]
rurukul1: 23  [0, 1000, 0, 1000]
rurukul1: 24  [0, 0, 0, 0]
rurukul1: 25  [0, 0, 0, 0]
rurukul1: 26  [0, 0, 0, 0]
rurukul1: 27  [0, 0, 0, 0]
rurukul1: 28  [0, 0, 0, 0]
rurukul1: 29  [362, 0, 1000, 0]
rurukul1: 30  [1000, 766, 1000, 27]
rurukul1: 31  [1000, 1000, 1000, 1000]

Urukul v1.1 connected to a Kasli v1.1:

0   [1000, 1000, 1000, 1000]
1   [1000, 1000, 1000, 763]
2   [1000, 1000, 1000, 1000]
3   [60, 0, 1000, 1000]
4   [0, 0, 0, 245]
5   [0, 0, 0, 0]
6   [0, 0, 0, 0]
7   [0, 0, 0, 0]
8   [0, 0, 0, 0]
9   [0, 0, 0, 0]
10  [0, 38, 0, 1]
11  [946, 1000, 0, 511]
12  [1000, 1000, 3, 1000]
13  [1000, 1000, 930, 1000]
14  [1000, 1000, 1000, 977]
15  [1000, 1000, 1000, 1000]
16  [955, 30, 1000, 1000]
17  [0, 0, 1000, 1000]
18  [0, 0, 0, 0]
19  [0, 0, 0, 0]
20  [0, 0, 0, 0]
21  [0, 0, 0, 0]
22  [0, 0, 0, 0]
23  [0, 0, 0, 0]
24  [999, 1000, 0, 217]
25  [1000, 1000, 0, 1000]
26  [1000, 1000, 0, 1000]
27  [1000, 1000, 838, 1000]
28  [1000, 1000, 1000, 1000]
29  [1000, 995, 1000, 1000]
30  [0, 0, 1000, 1000]
31  [0, 0, 1000, 4]

These are the number of SMP_ERRs per 1000 trials for each of the channels, with validation tap setting 0. At 2, the windows are down to 2 taps; completely closed at 4.

As Chris mentioned, we haven't seen any errors in production yet, but we haven't been looking very hard (i.e. only indirectly through relatively crappy quadrupole laser gates).

jordens commented 5 years ago

At 62.5 MHz SYNC_IN there are already 1e10 resyncs after 3 minutes of running. And since SMP_ERR is latching and checking each one of them, I haven't seen an invalid (re)sync in >1e13. I am also uncertain how "loss of sync" would manifest itself on the outputs if there is no frequency/phase change. My guess is that it would not even be a 16ns transient and even if there is a 16ns transient, you'll capture that only if you mix or diff the channels on a scope. The reasoning is based on conjecture how the DDS works internally (ADI patents and SAWG: SYNC_CLK will have a pair of a short and a long cycle length glitches but since the outputs run at 1 GHz output will be exactly the same).

I.e. I am not worried about using a window that is 6 taps wide at validation delay 1. But I am worried about dealing with a window that is only 2 taps wide at validation delay 1.

jordens commented 5 years ago

You shouldn't need to repeatedly check SMP_ERR. It's latching. Just let it hammer for a couple µs. That width of 5 at validation delay 0 (2 at 2 and 0 at 4) is as bad as the data from Tester and worse than the 5-6 at 1 that I get here (with PTB2).

jordens commented 5 years ago

I forgot your posts on the other issue. Thanks for digging them out.

The IO_UPDATE delay tuning is done. That is now measured without external hardware and can be done at runtime. And it is stable to the ns over all PVT cases I have looked at.

dnadlinger commented 5 years ago

Yep, I saw your (nice) commits – we'll definitely have a look at porting our code over from the quick stopgap fix to your driver soon.

cjbe commented 5 years ago

@jordens in our work we are using the 'clear phase accumulator on IO_UPDATE' mode. This means that a sync error looks like a 1ns phase origin glitch, which is very obvious (i.e. 90 degrees at 250 MHz).

For my sync tests I checked the alignment of the RF outputs with an RTIO TTL output - I confirmed that sitting outside of the window caused obvious phase alignment errors, and that sitting at the edge of the window (as measured with 0 validation delay) caused a small but measurable phase alignment error rate.

cjbe commented 5 years ago

You shouldn't need to repeatedly check SMP_ERR. It's latching. Just let it hammer for a couple µs.

Yeah - the repeated checking on the eye scans is just to get an error rate estimate.

jordens commented 5 years ago

@cjbe Could you clarify what you mean by "sync error"? Not "SMP_ERR", probably. And how are you seeing that "sync error" if the next SYNC_IN event (corrective reset of the SYNC_CLK generator) is just 16 ns away? How long are the "glitches"? Pretty sure that SYNC_IN hitting the wrong SYSCLK cycle is invisible as long as there is no phase/frequency change at the same time. And as you say, you haven't seen anything and I haven't seen anything either, even when provoking SMP_ERR. You would see a misalignment between DDS outputs if the SYNC_CLK is misaligned at the same time as the IO_UPDATE event. That's what I can see as well. Is that what you tested?

Yeah - the repeated checking on the eye scans is just to get an error rate estimate.

But then you iterated over the 1000-iteration another 4 times...

@klickverbot ACK.

But we should figure out where that high SYNC_IN jitter comes from. If someone with access to jitter measurement tools could have a look, that would be great. Might also move this to Kasli.

jordens commented 5 years ago

From a quick look with a spectrum analyzer and scope, SYNC_IN after going through the fanout, another IDC cable and a LVDS-to-CMOS converter is pretty clean. Spurs (from sys_clk logic modulating rtio_clk) on the SYNC_IN fundamental are down ~50 dB, on the 7th harmonic down ~30 dB. Also very clean close in to carrier (1 kHz to 1 MHz). The jitter is on rather fast timescales as already a couple dozen µs of sampling show the problem.

cjbe commented 5 years ago

@jordens

Could you clarify what you mean by "sync error"? Not "SMP_ERR", probably. And how are you seeing that "sync error" if the next SYNC_IN event (corrective reset of the SYNC_CLK generator) is just 16 ns away? How long are the "glitches"?

By 'sync error' I mean observing that the relationship between the DDS phase and an RTIO event is incorrect. The DDS is in a mode where the phase accumulator is reset to zero on IO_UPDATE. If the DDS state machine is not properly synced when it registers the IO_UPDATE the DDS phase is incorrect (i.e. the DDS chooses the wrong edge of the 1 GHz clock as the phase origin, leading to ~90 degree phase shifts for 250 MHz output).

I triggered the scope from an RTIO TTL output at a fixed delay from the IO_UPDATE - if everything is working correctly the DDS phase should be fixed relative to the RTIO output event. If this phase is incorrect the DDS was not properly synced at IO_UPDATE.

cjbe commented 5 years ago

@jordens

But then you iterated over the 1000-iteration another 4 times...

Ah - these are the SMP_ERR counts for channels 0..3, so that 24 [999, 1000, 0, 217] means that at validation delay tap 24 channel 0 had 999 errors out of 1000 samples, channel 1 had 1000 errors out of 1000 samples, etc.

gkasprow commented 5 years ago

IT's worth looking at the CPLD IO suply rail. The SMPS may work in discontinuous mode causing high ripples on CPLD supply.

jordens commented 5 years ago

That signal doesn't go through the CPLD. It would need to be crosstalk from the control lines of the fan out. The fan out supply seemed clean.

jordens commented 5 years ago

With the current (extremely lenient) algorithm and about two taps of margin even CFL tubes being switched on will reliably cause SMP_ERR to latch here. This is in a grounded, closed enclosure (albeit not RF shielded). There is something wrong here.

gkasprow commented 5 years ago

You can try with FSEN pin state on LVDS receivers. It may affect the jitter.

AUTProgram commented 5 years ago

I have run sync_scan from the ad9910 test suite several times on two cards of the old (v1.0) and two cards of the new (v1.3) hardware versions of Urukul. Overall I did about 30 runs on each card.

For cards of revision 1.0, the errors resulting from different validation delays were quite variable, typical results for one card would look like these:

about 70% of all runs:

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

about 20% of all runs:

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

about 10% of all runs:

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

and for the other card about 2/3 of runs:

[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]    
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

about 1/3 of runs:

[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1]
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

For two cards of the new revision 1.3, all runs basically gave the same result:

[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]
[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]
[1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

The results for the (Creotec) v1.3 boards are consistent with pk-pk jitter of approx 400ps.

AUTProgram commented 5 years ago

@gkasprow what level of jitter would you expect for the LVDS SYSREF generated by Kasli after going through cabling, LVDS buffers, etc? Any idea why the newer boards seem to behave better than the old ones?

@jordens which versions of the hardware did you test?

hartytp commented 5 years ago

We were setting up to look into the sync in jitter issue and see if we could locate its origin, however we weren't able to reproduce it in our test setup.

Some other details (probably not material, but for completeness):

gkasprow commented 5 years ago

Jitter is dominated by the LVDS transceivers and could be even 0.3ns. obraz

hartytp commented 5 years ago

@gkasprow that 300ps is almost all data-dependent jitter, right? I'd have to double check what we're doing in the calibration code, but I'm not sure that could account for what we see.

It also doesn't explain the observation that we see no "eye" for some window sizes on the older hardware.

gkasprow commented 5 years ago

Yes. So with square wave pattern it should not be visible. The boards could differ by level of 3V3 rail noise that could affect jitter significantly.

gkasprow commented 5 years ago

Such deterministic jitter depends on activity on neighbouring channels. So you can check if during calibration something happens on SPI.

hartytp commented 5 years ago

Such deterministic jitter depends on activity on neighbouring channels. So you can check if during calibration something happens on SPI.

Anyway, even so, 300ps of deterministic jitter still wouldn't explain the issues @jordens and @cjbe observed.

The boards could differ by level of 3V3 rail noise that could affect jitter significantly.

From a quick skim over the schematics, I didn't see any changes to the power supplies which could explain this, but maybe I missed something.

Could also be something to do with the clocking of the Urukuls from Kasli since I'm using the newer Kasli and @cjbe was using the older Kasli with worse clock distribution/floated MMCXs.

gkasprow commented 5 years ago

That could be a matter of i.e. capacitors used. Other vendor means different characteristics.

hartytp commented 5 years ago

@gkasprow I was wondering about that kind of thing. If, the decoupling somewhere is a bit marginal then the quality of the capacitors used could have a large impact on performance. Anyway, even our results with the v1.0 hardware look better than the data @jordens posted at the top of this issue, so I don't think this is just to do with the vendor of Urukul.

gkasprow commented 5 years ago

If you have SSA, you can simply pass known clock signal to the Urukul and back and see how it gets degraded. I've just ordered 6GHz SSA to my lab, so won't have to borrow it any more. They will deliver it in a few days. To do such test I will need one problematic Urukul.

jordens commented 5 years ago

I had already checked crosstalk from busy SPI lines and I had looked at the signal after the fanout and another lvds-cmos converter, with a SA and not with a SSA though. My suspicion is that there is something going on between the fanout and the dds input. The jitter timescales are not slow (<100µs).

jordens commented 5 years ago

The FS thing or noisy switchers shouldn't do anything either since that signal doesn't go through the LVDS converts or the CPLD and power supplies are the known-good clean ones.

jordens commented 5 years ago

Could you also test the window width using the first dds sync out and compare that to my measurements above?

hartytp commented 5 years ago

I had already checked crosstalk from busy SPI lines and I had looked at the signal after the fanout and another lvds-cmos converter, with a SA and not with a SSA though. My suspicion is that there is something going on between the fanout and the dds input. The jitter timescales are not slow (<100µs).

Okay, good, I didn't realise you'd done such a thorough investigation already.

Out of curiosity, how did you probe the signal after the fanout? (lift a resistor and solder some coax onto the board)? IIRC there isn't a coax TP that one can use to do this easily.

hartytp commented 5 years ago

@gkasprow checking I understand the design:

Are we sure that the stub added by having both the LVDS->LVCMOS and the clock fanout won't cause SI issues?

gkasprow commented 5 years ago

There should be no stubs. I placed the chips in such way that LVDS line has termination at the end. So from SI point of view there should be no stubs in any configuration.

hartytp commented 5 years ago

ok.

hartytp commented 5 years ago

Could you also test the window width using the first dds sync out and compare that to my measurements above?

Yes, but probably not for a day or so.

jordens commented 5 years ago

This might also be a ringing/impedance/drive strength issue. I see double windows on in some cases.

gkasprow commented 5 years ago

@marmeladapk we plan to release the next revision of Urukul. I want to make sure we fix all known issues. Can we recreate this issue in our lab?

dnadlinger commented 5 years ago

Here is an updated version of the hacky script to reproduce the above numbers: https://gist.github.com/klickverbot/fb54dc976e18373ed34d4f2fc55f0ffe

Looking at the window widths (and maybe full scans) reported by Robert's code should be just as good, though.

gkasprow commented 5 years ago

I have two Urukul v1.1 in the lab. Can we use this version to recreate the issue?

jordens commented 5 years ago

I am pretty sure that all versions have the issue.

hartytp commented 5 years ago

@marmeladapk @gkasprow did you ever look at this?

gkasprow commented 5 years ago

not yet but we will, we had higher priorities

hartytp commented 5 years ago

No problem. That's what I assumed, but I wanted to be sure.

marmeladapk commented 5 years ago

@jordens where can I find the code that you use for buildbot CI tests which gives these sync patterns?

hartytp commented 5 years ago

Have a look here https://github.com/m-labs/artiq/blob/33b28f6e56a59810b0ee7ea0594a291e9d300e79/artiq/test/coredevice/test_ad9910.py#L341

marmeladapk commented 5 years ago

Finally had some time for it. Some SYNC_IN measurements (though I do not have high confidence in them): 2 3

jordens commented 5 years ago

Hmm. 1.2 ns jitter should not work at all. And that forest from 90 to 110 mhz is not a modulation but additional signals, your Warsaw FM radio stations probably.

jordens commented 5 years ago

Other than that the measurement looks similar in quality and in numbers to what I had reported above. The biggest intrinsic spurs (sys_clk modulation crosstalk) are down 45-50 dB.

gkasprow commented 5 years ago

@jordens True, we have a few kW FM transmitter a few hundred meters from WUT, so it is quite obvious.