AMC WR options - Githubissues

hartytp commented 5 years ago

If I've understood correctly, @sbourdeauducq believes that we will have issues with the AMC WR implementation, even if both DDMTD inputs are from IOs in the same bank as the helper PLL. This is a problem that's unique to ultrascale FPGAs, so won't affect Kasli or Sayma RTM.

We have three options to get around this:

Don't bother with WR on the AMC. Route the recovered clock to the RTM via the AMC to RTM connector and do all the clock recovery on the RTM. The main risk here is that we'd be routing a clock at f_RTIO via a connector that is not designed to be low cross-talk/low pick-up. Any noise close to the carrier cannot be removed by the WR PLL.
Add a small cheap FPGA to the AMC and implement the WR PLL on that.
Add an external dual DFF to the AMC and use that for the DDMTD. We'd have to make sure the FF outputs meet S/H at the AMC FPGA. This shouldn't be an issue at 150MHz, since the DFF timing is very stable over PVT (https://www.onsemi.com/PowerSolutions/product.do?id=MC100EP29) and we should be able to add proper timing constraints. But @sbourdeauducq would want to add some extra S/H violation measurement logic to make sure it's okay. This is doable, but more work on the gateware/software side of things.
Kick this into the long grass. Add sufficient MMCX connectors to the AMC that if necessary we can move the clock recovery to a separate card that can be hacked onto the AMC. It can then be integrated into Sayma v2.1 after it's been prototyped on Sayma AMC v2.0.

gkasprow commented 5 years ago

Does this small FPGA have to handle DRTIO logic as well? Or we use Kintex US CDR clock routed via IO pins to the second FPGA that implements WR?

hartytp commented 5 years ago

The latter. In detail:

Kintex CDR clock drives a MGTREF output
MGTREF output goes to a diff IO on timing FPGA
helper DCXO goes to a clock pin on timing FPGA
main DCXO goes to a diff input on timing FPGA (all three in same FPGA bank)
timing FPGA drives I2C bus with DCXOs on
timing FPGA runs white rabbit loop clocked at 125MHz (DDMTD, deglitch, loop filter, drive I2C bus)
probably want some LVCMOS signals from the timing FPGA to the main FPGA for locked indicators and other control signals. Not sure what kind of interface @sbourdeauducq would want to use for timing FPGA configuration (SR?).

gkasprow commented 5 years ago

Would she smallest Artix chip do the job? we can use XC7A15T-3

hartytp commented 5 years ago

Would she smallest Artix chip do the job? we can use XC7A15T-3

Would almost certainly be fine. I can get Weida to port his WR code to that and check it meets timing if that helps?

FWIW, I still feel that using an external dual FF is a more elegant approach.

gkasprow commented 5 years ago

True, additional FPGA means additional code to maintain, extra communication interface, additional cost and possibly supplies. DFF seems to be simpler approach.

hartytp commented 5 years ago

True, additional FPGA means additional code to maintain, extra communication interface, additional cost and possibly supplies. DFF seems to be simpler approach.

That's my feeling as well. I suspect a dual-DFF like MC100EP29 will give better performance.

The only concern I'm aware of with that would be @sbourdeauducq's concerns about the DFF output meeting timing at the FPGA inputs due to ultrascale-related issues (which I don't fully understood since I haven't played with that FPGA much). However, I suspect that with proper timing constraints this won't be an issue. And, if it is an issue, then I think we're in all kinds of other trouble (e.g. SUServo won't work on Sayma).

@sbourdeauducq for S/H violation detection, what about driving the DFF S & R inputs from the FPGA. Toggle those pins and check that the FPGA always receives the correct input.

hartytp commented 5 years ago

Side note: if we go for a DFF on the AMC and Artix-7 IOB FFs on the RTM, it will be interesting to compare the performance. I wonder if we will see better stability with the external FF or not. I'm not aware of anyone having looked at that before...

gkasprow commented 5 years ago

True, let's go for it. We can always add a tiny module with an alternative clock approach to the GPIO pins and uf.l clock input.

hartytp commented 5 years ago

That's what I'd like to do, but we should get @sbourdeauducq to sign off on it.

Give me a day or two to finish writing up the clocking, so we have a clear plan. Then we can get everyone to sign off and freeze the design.

sbourdeauducq commented 5 years ago

For sending the recovered clock to the FFs I think MGTREFCLK1_225 should work (but we should check that Vivado can compile the design, and also maybe test on the existing hardware that OBUFDS_GTE3 works as expected by looking at MGTREFCLK0_225 on the uFL).

hartytp commented 5 years ago

@WeiDaZhang Can you adapt your test design to Sayma v1.0 and see if you can measure the noise/stability of the recovered clock on MGTREFCLK0_225 using the available UFL connectors?

@sbourdeauducq do you want a test compilation with ARTIQ, or just with a simple test design that instantiates the transceivers?

sbourdeauducq commented 5 years ago

Probably easier to use ARTIQ anyway.

gkasprow commented 5 years ago

@hartytp I started implementing the discrete DFF approach to the DDMTD. And the question is if the DFF outputs need to be fed to the same bank as helper DCXO clock? IMHO not because we don't care about the delay here so much. IMHO they don't need to be DC coupled as well. In the same bank as helper DCXO, I have only 2 LVCMOS pins. In other banks, i.e. DDR3 one, I have many more pins as well as diff pairs.

gkasprow commented 5 years ago

@hartytp please have a look. obraz

I assumed we need to route the helper clock to the FPGA as well because the filter logic will rely on it.

sbourdeauducq commented 5 years ago

Is there good SI when connecting the clocks in parallel like that?

gkasprow commented 5 years ago

I will place DDMTD very close to the oscillators. so there won't be any stubs.

sbourdeauducq commented 5 years ago

Another thing - can we have an option (possibly involving soldering, e.g. capacitor selection) to use a general-purpose I/O instead of OBUFDS_GTE3? In theory, OBUFDS_GTE3 works better than the general-purpose I/O, but there can be surprises...

gkasprow commented 5 years ago

OK.

gkasprow commented 5 years ago

We can re-use the REC_CLK signal that already goes to the SI5324. Another option is to route a signal from DDR3 bank.

sbourdeauducq commented 5 years ago

REC_CLK is fine. I'd rather keep related things in the same bank as much as possible, Ultrascale I/O and clock routing is a bit weird (and poorly documented). The "global" clock buffers aren't global.

gkasprow commented 5 years ago

so maybe I should deliver another copy of the Helper CLK to the SDRAM bank, where other DDMTD signals are routed? or convert DDMTD outputs to single ended?

sbourdeauducq commented 5 years ago

or convert DDMTD outputs to single ended?

Yes, do that.

hartytp commented 5 years ago

And the question is if the DFF outputs need to be fed to the same bank as helper DCXO clock?

No, they don't. We only need to meet timing at 125MHz, but we don't care about a few clock cycles of latency so can always add a few FFs to help timing (although, let's try to minimize the amount of that required).

I assumed we need to route the helper clock to the FPGA as well because the filter logic will rely on it.

yes.

We can re-use the REC_CLK signal that already goes to the SI5324. Another option is to route a signal from DDR3 bank.

That's was my plan as well.

Another thing - can we have an option (possibly involving soldering, e.g. capacitor selection) to use a general-purpose I/O instead of OBUFDS_GTE3?

If we end up needing a fanout buffer for this we can make it a mux, otherwise solder jumpers are fine.

hartytp commented 5 years ago

@hartytp please have a look.

Topologically, that looks fine to me. Please just double check that all signals are correctly biassed and terminated and all logic levels (CML/LVDS) are noted on the schematic.

sbourdeauducq commented 5 years ago

No, they don't. We only need to meet timing at 125MHz, but we don't care about a few clock cycles of latency so can always add a few FFs to help timing (although, let's try to minimize the amount of that required).

To me it sounds cleaner and easier to just use the same bank and convert to single-ended. The only serious issue with single-ended signals is it may make certain setup/hold validation schemes more difficult or impossible.

hartytp commented 5 years ago

Works for me.

gkasprow commented 5 years ago

True, the LVPECL to LVCMOS converters have usually large delay variation. So let's keep the DDR3 bank inputs. I will add test points to the single-ended inputs.

sbourdeauducq commented 5 years ago

True, the LVPECL to LVCMOS converters have usually large delay variation.

How large? Ultrascale FPGAs aren't exactly stable either, plus come with obscure Vivado errors and frustration...

sbourdeauducq commented 5 years ago

For MAX9171/MAX9172 for example, it's 1.5ns maximum, which isn't crazy... Is it really safe to AC-couple the outputs of those flip-flops? If the "glitchy" behavior isn't symmetric during rising and falling transitions of the DDMTD outputs, the capacitors might introduce some subtle effects. If we use converters, we can perhaps use CML or CML-tolerant ones, and not worry about it...

hartytp commented 5 years ago

Isn't there an RC network we can use to convert LVPECL to LVCMOS/STL with decent performance?

sbourdeauducq commented 5 years ago

With capacitor tolerances it might not necessarily be better than the 1.5ns of the MAX part...

hartytp commented 5 years ago

Well, for DC coupled signals, it's actually probably just a R network. Anyway, component tolerances can be simulated quite quickly.

I don't object to an active solution, just felt that a passive resistor ladder is cheaper/simpler than adding a new BOM line. I don't feel strongly and wouldn't want to hold the design up for this so any method that @gkasprow thinks will work is fine by me...

gkasprow commented 5 years ago

we can DC-couple to SSTL15 easily using just 4 resistors. It's hard to DC couple to LVCMOS18 using passive components. Since you don't care about propagation variation between 900ps and 3ns so what's the difference if you receive it on same bank or different bank

hartytp commented 5 years ago

Ouch, that's huge!

It would be nice to have propagation delay that's a bit more stable than that, as it makes some S/H validation schemes easier.

I would have expected that coupling into single-ended SSTL15 with resistors would be really quite stable as long as we don't use overly large resistor values (the rising edges are damn fast).

gkasprow commented 5 years ago

This solution does the job nicely. It also keeps the output impedance low so there won't be any rise time issues. obraz

sbourdeauducq commented 5 years ago

The MAX part I was mentioning is more stable, and also you get two translators in one chip.

Anyway it's quite hard to decide what solution really is the best:

Active solution with MAX chip and same FPGA bank:

less likely to run into obscure Vivado errors, less likely to require mucking around with BUFG chains and timing constraints to fix clock routeability issues, and potentially deal with ensuing in-fabric skew/timing issues.
less likely to run into Ultrascale instability (may be correctable with Vivado constraints, which I have not explored).
clock and data in the same bank is the normal Ultrascale I/O configuration, and it is unlikely to hit corner cases, bugs, and other problems.
one more BoM line
S/H validation schemes that rely on differential signals cannot be used

Passive solution with resistors and different FPGA bank:

potential Vivado/Ultrascale problems
potentially no extra BoM line, easily sourced components
more possibilities for S/H validation

sbourdeauducq commented 5 years ago

Maybe add the MAX circuit as an option on the layout and DNP it, then we can solder it on if the Xilinx stuff acts up?

hartytp commented 5 years ago

@gkasprow I mean using resistors to match the differential LVPECL output to a single-ended STL input at the FPGA. If we did that, I think we could keep all signals within the same bank...

gkasprow commented 5 years ago

Maybe a simpler option is to route Helper DCXO clock copy to the SDRAM bank.

sbourdeauducq commented 5 years ago

Then why would we need the original clock?

gkasprow commented 5 years ago

we need it to get rid of SH violation that would happen when we route the clock between banks.

gkasprow commented 5 years ago

@I'm not sure we can instantiate single ended SSTL18 together with nearby LVDS in the same bank. It needs checking because Xilinx sometimes has some strange constraint if we want to place single ended STL signals close to other ones. SSTL needs a reference voltage. I routed Vbias_0.9V to the bank66 VREF. So if @sbourdeauducq confirms that we can use SSTL18 I will route DDMTD there with modified resistor dividers. It has one drawback. The output voltage from LVPECL has a large variation of the low/high level so it would be hard to make it working with fixed SSTL18 ref level. THat's why I preferred differential signal routing

sbourdeauducq commented 5 years ago

That doesn't help, it just displaces the problem. You'd have the issue again when crossing the two clock domains. But AFAICT just moving (completely) the two FF outputs plus the helper clock to the SDRAM bank should be OK.

gkasprow commented 5 years ago

At the moment the differential SSTL outputs are routed to the SDRAM bank. I will route the clock as well and this should solve the issue.

sbourdeauducq commented 5 years ago

OK. @hartytp please confirm that we only need to run the DDMTD edge detector on that clock and then it's just an asynchronous CDC of the counter value to another domain (just like the implementation in jesd204_tools).

hartytp commented 5 years ago

@sbourdeauducq I'm not sure I follow your question. Normally, one would run the entire WR core from the helper PLL. That includes the helper DCXO frequency lock, the main loop DDMTD, the loop filter, the I2C cores etc.

What's your reason for wanting a CDC? Are you worried about meeting timing in the overall system if it has inputs from the DFF in one FPGA bank and outputs to the I2C core in another bank? If so, wouldn't it be easier to add a few pipe-lining regs than to have a CDC?

sbourdeauducq commented 5 years ago

Normally, one would run the entire WR core from the helper PLL. That includes the helper DCXO frequency lock, the main loop DDMTD, the loop filter, the I2C cores etc.

Then any I/Os used by those cores should be in the same bank as the helper PLL clock.

sbourdeauducq commented 5 years ago

wouldn't it be easier to add a few pipe-lining regs than to have a CDC?

No.

sbourdeauducq commented 5 years ago

@gkasprow What are your particular objections to the MAX9172?

WeiDaZhang commented 5 years ago

If all on helper clk is not realistic, a FIFO could be placed right before the I2C interface, that is when the control signal has been calculated, given an I2C core reacts a FIFO non-empty flag promptly and deterministically in a sense.

sinara-hw / Sayma_AMC

AMC WR options #36