jordens commented 8 years ago

Summarize number and types of signals on RTM. <<< THIS IS A DRAFT, @jordens, @gkasprow, @sbourdeauducq please review >>>

Xilinx Artix7 (A7)

provides low-speed IO for ICs on RTM
p/n Xilinx XC7A15T
RTM connect
- A7 transceivers to KU IOSERDES (how transceivers?)
- few extra A7 LVDS pairs to KU IOSERDES (how many?)
- A7 into Slave Serial mode controlled by KU (how many pins?)
- clock
  AD9154
2 DACs total (4 output channels each)
RTM connect
- 16 differential pair for Tx transcievers – JESD204B data lines
- 2 differential pair – SYNCOUT line for JESD204B synchronization
- 1 differential pair – DAC clock to FPGA
  AD9656
2 ADCs total (4 input channels each)
RTM connect
- 8 differential pair for Rx transceivers – JESD204B data lines
- 2 differential pair – SYNCb line for JESD204B synchronization
- 1 differential pair – ADC clock to FPGA
- 1 differential pair – SYSREF to FPGA (allows absolute time synchronization)
  low-noise power
noisy power generation on AMC, analog power delivered to RTM
+/- 7V
- 8 A (2 A per analog mezzanine)
- [ ] 1 mV pp noise (what bandwidth?)
+3.3 V
- 8 A (2 A per analog mezzanine)
- [ ] 1 mV pp noise (what bandwidth?)
+/- 12 V
- [ ] what current?
  TODO
[ ] pin assignments on J30 and J31
[ ] what is maximum current for J30 and J31

dhslichter commented 8 years ago

As mentioned in #12, slow parallel digital buses to daughtercards to be sent as high-speed differential over these connectors using standalone SerDes chip pairs e.g. http://www.ti.com/product/DS92LV3241. In this instance might follow the deserializer with a buffer to increase drive power, potentially with an output enable controlled by deserializer LOCK pin. All signals through connector to be high-speed differential.

dhslichter commented 8 years ago

Here are my notes from April, with some modifications (check for errors):

2x AD9154 (8 output channels):

16x (8 each) differential pair – JESD204B data lines 2x (1 each) differential pair – SYNCOUT line for JESD204B synchronization 3x single-ended lines – SPI CLK/MOSI/MISO 2x single-ended lines – SPI chip select 1x differential pair – DAC clock to FPGA (the SYSREF signal for deterministic latency will be generated on the RTM, but the DAC clock also needs to be fed back into the FPGA to ensure that it clocks data out at the correct rate) this may not be necessary given the current Sinara clocking scheme, TBA

Total: 19 differential pairs + 5 single-ended lines

2x AD9656 (8 input channels):

8x (4 each) differential pair – JESD204B data lines 2x (1 each) differential pair – SYNCb line for JESD204B synchronization 3x single-ended lines – SPI CLK/MOSI/MISO (can be shared with DAC SPI in a pinch) 2x single-ended lines – SPI chip select 1x differential pair – ADC clock to FPGA (to give input clock for JESD204B at FPGA) again may not be necessary with current Sinara clocking plan 1x differential pair – SYSREF to FPGA (allows absolute time synchronization)

Total: 12 differential pairs + 5 single-ended lines

So we are looking at a total of:

31 differential pairs, plus another 5 pairs worth of single ended lines, so 36 differential pairs in the RTM connectors are occupied.

Assume 4 analog daughtercards, each with 16 digital IO lines, for a total of 64 total digital IO (taking up 32 differential pairs’ worth for direct transmission).

This gives a total payload of 68 differential pairs’ worth for a fully loaded RTM. If you use the spec’ed connector from the standard that have 40 differential pairs each (pp.12-13 of https://www.slac.stanford.edu/grp/lcls/controls/docs/hw/uTCA/Standards/PICMG_MTCA_4_R1_with%20errata_SLAC.pdf), you get 74 differential pairs available (6 pairs are reserved for other purposes in the spec). The connector is here http://www.te.com/usa-en/product-6469002-1.html, and is rated for signaling up to 12 Gbps, which is suitable for our purposes. It's designed for differential signals.

So now you have 6 differential pairs left over for additional signaling to the RTM. Another option would be to use a SERDES chip on the RTM, allowing you to send commands that are not as time-critical using a single high speed differential pair (IOSERDES is still OK for this, no need for a gigabit transceiver), which would then be fanned out as a slow parallel bus on the RTM card using the appropriate receiver. It might be simpler from a gateware standpoint to just use a chipset Ser/Des pair, such as detailed here: http://www.ti.com/product/DS92LV3241, which turns 32 LVCMOS channels at 40-85 MHz bus speed into 4 differential pairs with embedded clock (with scrambling and DC balancing for EMI reduction). This chipset does all clock recovery/alignment/calibration internally, so the FPGA could just output as normal LVCMOS RTIO outputs, provide a suitable clock (ideally a submultiple of the RTIO clock) as reference for the SERDES chips, and the pulses would be output transparently at the daughtercards on the RTM with deterministic latency.

This would let us send 64 digital IO lines to the daughtercards using 8 differential pairs only, which would free us up to either offer more digital IO on the daughtercards and/or to have more lines free for other control/communications stuff, or future applications. The serialization/deserialization latency on this chipset is deterministic, which is necessary for our application. Also, sending balanced differential signals over the RTM connector is probably going to be better for crosstalk than large-swing single-ended LVCMOS.

We can debate whether to send the single-ended signals for the DACs/ADCs (e.g. SPI) over a SERDES as well, or to use LVDS transceivers to make them differential for reduced crosstalk, or to just send them through as LVCMOS.

gkasprow commented 8 years ago

Why would you need fast IO for mezzanines? Wouldn't be SPI extenders enough?

gkasprow commented 8 years ago

I’d route via RTM connector general purpose SPI + mux address lines for chip selects to serve all possible SPI chips directly

Then add I2C with muxes routed to all mezzanines

And only for general purpose IOs that may need MHz signaling rate, use SERDES chips.

On each mezzanines we would need to define SPI and I2C lines + some general purpose IO.

SPI or I2C over serdes could be an overkill.

gkasprow commented 8 years ago

We can also use DS92LV3221 / DS92LV3222 which require only 2 diff lines per chip. We can connect them directly to the FPGA and make serialization in logics. Otherwise we will loose plenty of IO pins

dhslichter commented 8 years ago

The analog daughtercards/mezzanines need to have "fast" digital IO, ~50 MHz bandwidth. We will want to program digital attenuators over SPI on the fly for fast range switching, and we also need to be able to turn on/off the RF switches at precise times. I think we should have all lines going to the mezzanines capable of this kind of bandwidth (with the exception of I2C lines) for maximum flexibility. Speeds only need to be as fast as typical switching times/clock speeds for the devices. RF switches go in about ~10-20 ns. HMC542BLP4E attenuator has a maximum serial clock speed of 30 MHz.

I would not distinguish between SPI and GPIO on the mezzanine connectors; this can be reconfigured in the AMC FPGA bitstream to match the needs of the particular mezzanine. For example, some mezzanines might want to use a serial-programmed digital attenuator but others a parallel-programmed one (because their other specs are more appropriate for a given application, say). If we just make all lines (except I2C, see below) bidirectional with min 50 MHz bandwidth, it should cover all use cases I can envision.

Because there will be a number of potential SPI devices on the RTM (DACs, ADCs, PLLs, plus chips on the mezzanines), I think it would be wise to route multiple separate SPI buses from the AMC so that programming can happen in parallel. In particular, digital upconversion in the DACs and digital attenuator values on the mezzanines will likely need to be changed within a short period of time in a typical experiment. It would be a shame to build this fancy hardware only to have it choke on needing to wait for an over-shared SPI bus. I think a dedicated SPI bus for each of the two DACs makes sense at a minimum, with a shared SPI bus between each ~pair of mezzanine cards (instead of one for all four). Then one more SPI bus can be shared between ADCs, PLLs, and other devices, where there will not be a need for rapid timing-critical reprogramming.

For I2C, which is slow, I think we can get away with a single I2C bus that goes through the RTM connector and is split out using an I2C bus switch (http://www.ti.com/product/PCA9548A) to talk to the different mezzanine cards. The driver for this I2C bus switch is already written as a part of ARTIQ. We would pick two pins on the mezzanine connectors to dedicate to SDA and SCL.

sbourdeauducq commented 8 years ago

@dhslichter The clocks you mention are necessary, though we can look into generating the "ADC/DAC clocks" (in fact the reference clocks for the transceivers) on the AMC if we are really out of RTM pins. In addition to those, we may want another SYSREF for the DACs (it is not clear whether we always want the same SYSREF for DACs and ADCs, so better keep options open) plus at least one backup clock line.

gkasprow commented 8 years ago

OK, so we route the I2C tree with addressable muxes. Can we simplify the things and make the mezzanine IO pins with defined direction? The 32-bit SERDES chips both 2 and 4 lane have defined direction. Even if you combine the two, the direction is still fixed. One would need to add the 3-state buffers and another register for direction control. And another SERDES chip for readout of the IO. ANd on the FPGA side, still another SERDES with inputs. So this complicates a lot. If we define 8 input pins and 8 putput pins per mezzanine, this would simplify a lot. From FPGA side we can use IOSERDES for transmission and IOSERDES with external clock for reception, so CDR won't be needed. If we use pair of SERDES chips on both siides, we'll loose plenty of FPGA pins and in this way either we won't have additional SDRAM or the FMC. Since SERDES mechanism is transparent from user point of view and with fixed IO direction, we can implement SPI on top of it. I2C must be done separately but we have dedicated pins on RTM for that.

sbourdeauducq commented 8 years ago

I suppose that those SERDES chips will introduce latency non-determinism in SPI transfers. Are they really necessary?

gkasprow commented 8 years ago

why they would add non-determinism? They are very simple devices, without buffering, FIFOs and so on.

sbourdeauducq commented 8 years ago

How does the receiver perform comma alignment? Does it program the divider at the output of the CDR that produces the character clock, or does it keep whatever phase the divider generates and reshuffles the bits? The latter mechanism produces a latency non-determinism (across reboots) of the duration of one character.

gkasprow commented 8 years ago

One more thing - for ADC and DAC SPI config, one can think of I2C - SPI bridges. I use them in some designs to simplify the startup configuration scheme http://www.nxp.com/products/interface-and-connectivity/interface-and-system-management/bridges/spi-slave-to-i2c-master-gpio-bridges/i2c-bus-to-spi-bridge:SC18IS603IPW

sbourdeauducq commented 8 years ago

We can also add a small FPGA (e.g. 6SLX4 or Lattice) on the RTM side to handle SPI-over-SERDES.

gkasprow commented 8 years ago

Yes, we can do that easily. But it is another chip to program and to maintain. From my point of view it is simpler. If you thing it is simpler for you, let's go for it. In case of the SERDES they don't mention any non-deterministic latency http://www.ti.com/lit/ds/symlink/ds92lv3241.pdf

gkasprow commented 8 years ago

For this FPGA we would also deliver the clock. In this way we'll get rid of any non-determinism. How would you want to programm this FPGA? Would it have its own FLASH config or would be loaded over JTAG from AMC FPGA?

sbourdeauducq commented 8 years ago

We can design deterministic receivers in FPGAs that do not require a clock, but having the clock makes a CDR unnecessary (so we don't need the more expensive FPGAs). Loading via JTAG or some other serial protocol from the AMC sounds good.

sbourdeauducq commented 8 years ago

I would actually prefer another protocol (e.g. Xilinx "slave serial", I don't know if Lattice has an equivalent) since then we can connect a debug probe to JTAG.

gkasprow commented 8 years ago

In our designs we use Xilinx engine that converts bit file to the JTAG commands. We lack pins in the RTM connector and JTAG is there by default.

gkasprow commented 8 years ago

For inter-chip communication we used Aurora IP core from xilinx. I needed to transfer 512 signals with 4MHz update rate between 2 chips. We can imagine similar situation here, but the update rate would be higher. How many LVDS links do you need between the two FPGAs?

sbourdeauducq commented 8 years ago

If we use the TI chips, is there even a way to connect the serial sides to the Sayma FPGA directly that does not involve reverse engineering or unfriendly paperwork? I cannot find documentation for the protocol.

jbqubit commented 8 years ago

For reference...

sbourdeauducq commented 8 years ago

As discussed on the phone today:

XC7A15T on RTM
Connect A7 transceivers to KU IOSERDES
Connect a few extra A7 LVDS pairs to KU IOSERDES
Put A7 into Slave Serial mode controlled by KU (perhaps with LVDS->LVCMOS receivers on RTM side to keep all RTM signals differential, but this is less important as they should only toggle during startup)

Initial A7 gateware version will probably use a dumb protocol similar to drtio_transceiver_demo.

gkasprow commented 8 years ago

We can connect the RTM FPGA to the Kintex US config memory and configure them together in one process. Once Kintex finishes config it starts Artix config from same memory. Number of pins utilized is similar but we keep same memory and same update scheme. http://www.xilinx.com/support/documentation/user_guides/ug570-ultrascale-configuration.pdf page 193 it says "UltraScale™ FPGAs can be daisy-chained with earlier 7 series families." so in theory this should work

dhslichter commented 8 years ago

One more thing - for ADC and DAC SPI config, one can think of I2C - SPI bridges. I use them in some designs to simplify the startup configuration scheme

I would prefer to keep all SPI buses with the ability to run at the maximum speeds allowed by the various chip specs, at least for the DAC configuration. PLL/ADC is less time critical.

gkasprow commented 8 years ago

It's already decided - we connect all the stuff to the ARTIX FPGA on RTM.

dhslichter commented 8 years ago

Great sounds good.

jordens commented 8 years ago

I would like to log my doubts about the usefulness of the "dumb protocol from drtio_transciever_demo". It's speed scaling with the number of pins, the implication for the time alignment as well as the dead-ended-ness of said "dumb protocol" were brushed away quickly.

sbourdeauducq commented 8 years ago

@gkasprow Please put another Si5324 on the 7A15T, connected as usual with one output to the IBUFDS_GTE2 and the other output to a clock-capable general-purpose input.

sbourdeauducq commented 8 years ago

I see three problems with daisy-chaining:

Fallback Multiboot is not supported. If we ever support flashing over DRTIO, this means we cannot recover from a flashing attempt that left the user bitstream corrupted.
SPI with more than one data pin is not supported, which can make configuration slow: the KU040 bitstream is almost 16 megabytes.
The common DONE pin makes debugging (reconfiguring only one device) difficult. It may be possible to make a FPGA complete configuration anyway with DONE pulled low by tweaking LCK_cycle, but I have never tried it (Xilinx recommends individually disconnectable DONE pins instead).

Daisy chaining is simply putting the downstream FPGA in slave serial mode and using hardwired logic in the upstream FPGA to send its configuration data. I would simply ignore that hardwired logic and implement the functionality in the fabric instead, which avoids the above problems. Please connect the second FPGA accordingly. You may want to add jumpers or 0-ohm resistors that connect DONE/INIT_B/PROGRAM_B for daisy-chaining so that the hardware can still support that option. CCLK is accessible via STARTUPE3 and DOUT is a general-purpose I/O.

jordens commented 8 years ago

And 32 bit addressing (required for flashing more than the KU040 bitstream) is also not supported or planned in neither openocd nor misoc/runtime.

gkasprow commented 8 years ago

OK, so I will add jumpers that either will enable direct RTM Artix config from Kintex IO pins of daisy chain config with Kintex as master and Artix as slave.

jbqubit commented 8 years ago

Here are the questions that I jotted down during Friday's discussion of related to putting FPGA on RTM.

which FPGA? -- resolved
how to configure FPGA (shared flash on AMC, JTAG chain, other?) -- resolved just now
[x] FPGA clock source
[x] synchronization of signals emitted by RTM FPGA
[x] signal format for communication between AMC FPGA and RTM FPGA eg 8 copies of 8:1 serdes for mezzanine, 1 copy of 32:1 for other IO
transport layer (transceiver or else) -- resolved

gkasprow commented 8 years ago

OK, So I placed SI5324 and connected one clk output to the MGTREFCLK1 and another output to the MRCC_14 Where do you want to have the CLK1 and CLK2 of the Si chip connected?

sbourdeauducq commented 8 years ago

I need only one clock input and connected to a general-purpose I/O of the FPGA.

jbqubit commented 8 years ago

This is handled by m-labs. Resolved. Doesn't impact hardware.

sbourdeauducq commented 8 years ago

That clock input must be CLKIN1, the current schematics do the right thing.

sinara-hw / sinara

amc-rtm interface #9

Xilinx Artix7 (A7)

AD9154

AD9656

low-noise power

TODO

2x AD9154 (8 output channels):

Total: 19 differential pairs + 5 single-ended lines

2x AD9656 (8 input channels):

Total: 12 differential pairs + 5 single-ended lines

31 differential pairs, plus another 5 pairs worth of single ended lines, so 36 differential pairs in the RTM connectors are occupied.