sinara-hw / RFSOC-AMC

AMC module with Xilinx RF-SoC and two analog front-end mezzanines for SDR and quantum applications
36 stars 14 forks source link

[RFC] RFSoCs from Xilinx #1

Open gkasprow opened 5 years ago

gkasprow commented 5 years ago

I got involved in new project at the WUT. We will build wideband MIMO SDR. Probably on the new RFSoC that:

You won't find any info with Mr Google. I got this news from my colleague. For the WUT project existing 6G DAC with DUC and 4G ADC with DDC would probably be enough, but I want to play with newest SOCs. For what applications would you use such beast? Would such upgrade for Sayma ecosystem be useful for ion trappers? My colleagues would love to use it for superconducting qubit readout. I still don't know what the ADC-DAC latency is. They need <1us together with demodulation and modulation algorithm. Both ADC and DAC use AXI stream. I plan to buy ZCU111 and simply measure it. The configuration for WUT project is minimal. I want to use 1517 package an hook up as many DDR4 channels as possible for all AWG applications. Is there any benefit of using Sayma AMC in such configuraiton? Initial idea is to make it as RTM and place SFP/QSFP, clock recovery, MMC, supply on AMC side.

sbourdeauducq commented 5 years ago

You won't find any info with Mr Google. I got this news from my colleague. I want to play with newest SOCs.

Any information about the level of bugs and misfeatures in this new shiny device?

dhslichter commented 5 years ago

I think this particular device is more suited to superconducting qubit applications than trapped ion applications, so I think those users should drive any development. In sc qubits, the name of the game is latency, latency, and latency. Once you have some tests done to see how bad the various pipeline delays are, it will inform how useful this may or may not be. Measurement is carried out in a few hundred ns, then you choose to apply an error correction pulse or not, and with the round-trip propagation delays between generator and mixing-chamber plate of the dilution refrigerator where the qubits live, you will need to have ~few hundred ns between the end of the measurement time at the ADC input and the start of a subsequent pulse which is conditioned on the measurement data.

For ion traps, the win would be if the tight integration means simpler data transmission protocols, but if the amplitude resolution and temperature coefficients are not that good on the ADCs or DACs relative to the current Sayma, I am not sure it would be worth it.

sbourdeauducq commented 5 years ago

In sc qubits, the name of the game is latency, latency, and latency.

https://www.e2v.com/products/semiconductors/dac/ Simple digital interface, latency as low as 1ns, no part of the chip that may burn if you don't use 350kLOC of spaghetti code from vendor.

dhslichter commented 5 years ago

Sounds good, but 10-bit and 12-bit DACs may not be fine enough resolution for ion trap applications. For sc qubit applications these might be good.

dhslichter commented 5 years ago

also, 48 lvds pairs per DAC channel :) parallel wins for latency but it comes at a cost in channel count. https://www.e2v.com/shared/content/resources/File/documents/broadband-data-converters/EV12DS460/EV12DS460_CI_DS.pdf

gkasprow commented 5 years ago

Yep, this is exactly where the problem is. I need plenty of channels + plenty of SDRAM channels. So have to choose, either ADC/DACs or SDRAM. Take into account that to talk to such ADC and DAC you have to use IOSERDES with delay-tuning logic which is also not for free in terms of logic. Colleagues from SAR satellite business are using these RFSoCs for some time and the level of bugs is surprisingly low. That's why I want to buy the eval board and test it with own AXI-stream piece of code.

gkasprow commented 5 years ago

EV12DS460A DAC is 1.4k EUR (@10 pcs). I saw the offer for EV10AQ190A (10bit/5GS/s) which is 703EUR (@5pcs) So if we combine 8 ADC with 8 DACs we get 16.8k EUR just for ADCs and DACs. Add another 3..4k for 678kLUT FPGA. The pricing of XCZU25DR is 11k (@1pc). So it is anyway much cheaper than conventional approach and you save a lot of power, board area, NRE and mitigate the risk. At least in theory :) @sbourdeauducq could you point me to the place where the issue with burning SoCs was described (" part of the chip that may burn if you don't use 350kLOC of spaghetti code from vendor.") ?

jordens commented 5 years ago

Ignoring the repetitive vague FUD and the whining, I am pretty sure that RFSoCs are the right and ultimately only way to do scalable large quantum systems with ion traps as well. Get more of the entire chain from smart and local control logic, DSP, ADC/DAC, AFE, drivers, all the way to the interface closer to the ions. I don't see an alternative approach to space, power and communication complexity challenges. Hilarious to suggest 48 LVDS pairs per DAC channel are a good idea. And once you can do fast gates, you want low latency to do fast feedback/EC algorithms. That applies to ions as well. If some ion trapper wants to get such an integrated and scalable system going they better start yesterday.

gkasprow commented 5 years ago

I simply want to do the same I did with AFCZ board and AMC box. They were created for WUT project with other applications in mind. One of CERN groups will use AFCZ for low level RF. It is compatible with Sayma RTM so one day if someone wants to use ZynQ to drive DACs there is easy migration path. For WUT project we needed only 3 DDR banks and FMC. The same applies to AMC box. We need 1Gbit SFP support for WUT project, but I designed it to run at 12Gbit (measured it a few days ago) Metlino is similar story. We will probably use it as data concentrator for detector experiment in Dubna so it needs to run at 10Gbit. I produced simple 2-layer version of Metlino to check if is mechanically OK. Then we will launch production.

The same I want to do with SDR for WUT project. We need ADC and DAC channels and some SDRAM. I want to use same AFE boards as for Sayma and make it compatible with AFCZ/Sayma AMC with future migration option to simple AMC without FPGA. Guys want to build 96-channel SC qubit readout system. They need centralised distribution of RF and sync signals. So RF backplane is still in the game. I got recently two complete uTCA crates with RF backplane and was surprised by the price.

jordens commented 5 years ago

@dhslichter I'd try really hard to answer stability requirements using active, better, faster, more frequent, and much more local calibration cycles supported by local and decentralized control. And I'd avoid converting them into passive bit depth and tempco requirements if at all possible. It should be possible to close that gap using the 14 bit RFSoCs that are available. @gkasprow I think an RFSoC on an RTM driven by Sayma is the right step now. For the exact reasons that you mention. Maybe in the long term people will figure out a different form factor that makes that block more stand-alone and decentralized from the upstream controls and better integrated with the respective (laser modulator, microwave for fridges, microwave for ions) downstream analog electronics. From looking that the available docs, my guess is that you'll see < 1µs latency analog-to-analog. Doesn't that eval tool (rfdc) spit out a latency estimate (it refers to latency in the change log).

gkasprow commented 5 years ago

@jordens That's true, I can synthesize some basic loop-back design and see what the Xilinx tool says about the latency.

gkasprow commented 5 years ago

I synthesized simplest possible configuration with 8 ADCs runing at 4GS/s with 2x decimation. It resulted with 128bit AXI bus running at 250MHz. I fed such bus to the DACs with 2x interpolation running at 4GS/s I also instantiated ZynQ SoC with DDR, Ethernet and UART. What surprises me is total On-Chip Power obraz

gkasprow commented 5 years ago

obraz

gkasprow commented 5 years ago

obraz obraz

gkasprow commented 5 years ago

This gives nice perspective of how much power can be saved when we get rid of the JESD or LVDS interfaces..

hartytp commented 5 years ago

I am pretty sure that RFSoCs are the right and ultimately only way to do scalable large quantum systems with ion traps as well

100% agree. If you really want to talk about scaling up then these kinds of RFSocs are going to be a must in terms of cost, space, power consumption, latency etc. Designs like Shuttler and Sayma are ideal for complex physics experiments or medium (~100ion) QC demos, but don't seem that well suited to going to thousands of qubits.

I'd try really hard to answer stability requirements using active, better, faster, more frequent, and much more local calibration cycles supported by local and decentralized control. And I'd avoid converting them into passive bit depth and tempco requirements if at all possible. It should be possible to close that gap using the 14 bit RFSoCs that are available.

Yes, specifying everything to be 16-bit and low temp co is the lazy approach. It works okay for physicist/research type systems where it's hard to pin down an exact specification since the challenges aren't well understood, but in the long run we need to think more carefully about what our actual minimal specifications are, and what the simplest way of achieving them is (e.g. local feedforwards based on integrated temperature sensors)·

gkasprow commented 5 years ago

Resource utilisation of the example are minimal: obraz

jordens commented 5 years ago

@gkasprow the example design I mentioned above (rfdc) seems to do just that (dac to adc loopback) and has a lot of latency calculation and calibration code on board if I am not mistaken. Maybe just play with that.

gkasprow commented 5 years ago

I saw it. But it has so many variables like inter-tile alignment and various synchronisation features. I will try to simulate the design with Vivado simulator where I can enter analog values and observe analog outputs.

dhslichter commented 5 years ago

I am pretty sure that RFSoCs are the right and ultimately only way to do scalable large quantum systems with ion traps as well

100% agree. If you really want to talk about scaling up then these kinds of RFSocs are going to be a must in terms of cost, space, power consumption, latency etc. Designs like Shuttler and Sayma are ideal for complex physics experiments or medium (~100ion) QC demos, but don't seem that well suited to going to thousands of qubits.

I agree with this as well; it's just an enormously challenging problem, and frankly the career incentives for deep multi-year dives into infrastructure development are nil unless that is your specific job on an industrial QC team. I also think that, while clearly frustrating, the kind of incremental design steps represented by Sayma and Shuttler (better than a PDQ or a DDS backplane on a KC705, but not the final version for a full scalable QC) are important and dangerous to skip entirely; you have to try this hardware out with actual ion traps to be able to catch subtle bugs. You also need to demonstrate progress on the physics side on a regular basis, which mandates not having to wait for the platonic ideal of QC hardware to happen -- you need better hardware to do fancier experiments, but you can't wait for notional perfection.

Frankly, for ion traps, I think the endgame would ideally have ADCs and DACs and logic in the same silicon, which also uses upper metal layers to define trap electrodes, because this would solve the interconnect problem too. Your AOMs are integrated nanophotonics and only need low power drive, etc. Put the whole damn lab on the chip with the ions trapped above it. Of course, there are a few steps between here and there ;)

I'd try really hard to answer stability requirements using active, better, faster, more frequent, and much more local calibration cycles supported by local and decentralized control. And I'd avoid converting them into passive bit depth and tempco requirements if at all possible. It should be possible to close that gap using the 14 bit RFSoCs that are available.

Yes, specifying everything to be 16-bit and low temp co is the lazy approach. It works okay for physicist/research type systems where it's hard to pin down an exact specification since the challenges aren't well understood, but in the long run we need to think more carefully about what our actual minimal specifications are, and what the simplest way of achieving them is (e.g. local feedforwards based on integrated temperature sensors)·

Again, you are preaching to the choir, but the amount of time and bandwidth required to design and implement this kind of system is substantial, and you will always end up spending ages on all the unexpected corner cases and whatnot as in any project (deterministic latencies? clock distribution quality? phase noise? all of these things need to be studied and the answers determined). I'm all in favor of rapidly interleaved self-calibrations of all sorts to make tempco and bit depth specs more relaxed, but it's kind of hilarious to be lectured about this when we are currently cutting loose all of the power detectors on Sayma AFE, for example, because of a feeling that they won't be used or that nobody will write the code to implement the desired automated calibrations.

Anyway, I am glad that @gkasprow is working on these things, and I think they are important and will pay dividends down the road. Live calibration loops and tricks like using the (for ions, unnecessarily high) sample rate to do some sigma-delta modulation to increase the effective bit depth would be useful pieces of the puzzle to develop. The silicon cost of ~$1.5k per DAC/ADC pair is not prohibitively high, but putting hundreds of channels together is going to be a spendy proposition as well -- another reason why physicists shy away, and why @hartytp has spent so much time niggling over $2 price differentials for odds and ends on these boards.

sbourdeauducq commented 5 years ago

@jordens Please stop shutting down all discussions about hardware bugs. The sorry state of modern hardware is documented by other people, e.g. https://www.embeddedrelated.com/showarticle/988.php - it's not me being "irrational", "whining", or spreading "vague FUD" as you say. It is a real issue; causing pain, frustration, delays and costs. Since the development time on such projects can be completely dominated by debugging, it needs to be addressed appropriately, e.g. by having a number of employees whose full-time job is to figure out hardware misbehavior, by having closer collaborations with the designers of the chip, and/or by preferring silicon vendors who care about those issues (e.g. SiLabs and not HMC).

You are the ones being irrational here and overly excited about this new Xilinx gadget. In my opinion, a more reasonable stance would be: "for political reasons, we need to make incremental improvements to the control hardware, and a particularly interesting one would be increase the channel density dramatically on a AWG-type device. This requires integrating the ADC/DACs with the logic on the same chip, and the only product on the market today is the Xilinx RFSoC. We think that the advantages of this approach outweigh the problems associated with this chip, such as having to dedicate large amounts of resources to debugging and sorting out quirks."

The source for the risk of physical damage to the chip when not programmed in the official manner is the same as the source for Greg's specs of the next RFSoC - someone told me and you won't find it with Mr Google. Different people have different concerns, it would seem.

jordens commented 5 years ago

But you are not even discussing hardware bugs let alone proposing workable solutions to anything or evaluating a specific strategy. You just say that they (bugs and solutions) exist. Nobody doubts that and it is certainly of crucial importance to allocating sufficient resources to working around potential bugs. But iterating that face is not contributing to this discussion. What hardware bug relevant here do you want to discuss? How does that bug relate specifically to Greg's proposal? How does knowledge about the bug allow for better alternative proposals? Those would be good to hear as part of a discussion of hardware bugs. Can you at least take the device-independent general claim of your personal preference of one vendor over the other and the pattern of airing the fact that there are bugs in hardware somewhere else? I don't see how that is getting us anywhere. Isn't that quote very much the summary of what was said so far (minus the "political reasons" which I don't see and you don't substantiate)? If you read it again side by side I don't see anything new of different. You can call it "new Xilinx gadget" if you want but then you'll have to live with us calling your contributions "whining".

dhslichter commented 5 years ago

Here's my synthesis and take on the whole situation:

jordens commented 5 years ago

Getting back to specific technical and strategic questions, @gkasprow what's the plan with AFEs and panel connectors on that RTM? Is there anything coming from the WUT MIMO SDR or the NBI SC qubit people in therms of alternatives to analog over SMP/FMC and SMA/MMCX on panels? Given the device-internal crosstalk, I guess FMC might be just fine. You say they need high channel density. Would they be using both DACs and ADCs at baseband?

dhslichter commented 5 years ago

Agreed that FMC sounds totally fine given -70 dB typical internal crosstalk. Honestly at these crosstalk levels I'd like to see us investigating COTS impedance-matched multi-channel differential cables (e.g. HDMI) to ship signals around the lab, rather than coax with SMA/MMCX. That also pushes the balun problem out to a user-defined endpoint board, takes them out of the crate entirely and lets people make choices depending on their frequency band of interest.

gkasprow commented 5 years ago

We plan to use same FMC-like boards as in Sayma RTM. The SC people I know don't care what connectors are used.

gkasprow commented 5 years ago

I got offer for XCZU25DR-2FFVG1517E. It is 8.6k $ at 2 pcs. So it looks quite good.

hartytp commented 5 years ago

Do you need an AFE mezzanine for this, or can you put the AFE directly on the RTM?

gkasprow commented 5 years ago

For WUT project I need to make at least 3 frontends for different radio bands. So AFE could be good idea. One of them would be up/down converter for X-band.

hartytp commented 5 years ago

makes sense.

gkasprow commented 5 years ago

Such AFE would look like this: https://github.com/gkasprow/FMC_SDR/wiki

jordens commented 5 years ago

HDMI cables are probably not a good idea for analog. I have seen crosstalk specified as <-20dB and measured typically at -35 to -45 dB at GHz which really isn't all that surprising since they are digital.

gkasprow commented 5 years ago

HDMI don't have individual shields. But SAS cables from 3M could be excellent choice. They have twinax pairs enclosed in aluminium foil. They have 8 pairs and a few side-band signals. obraz

One can have internal connectors or bigger, external ones. The same assemblies are used for external PCIe, but with higher pin count and also price

gkasprow commented 5 years ago

more info about the cables

dhslichter commented 5 years ago

These look really nice @gkasprow, and if we can buy them already connectorized for SFP then this would be a nice way to route things around. I can't find a crosstalk spec anywhere but based on the cable design I find it hard to imagine it wouldn't just be limited by crosstalk on the boards at either end of the cable.

@jordens ack about HDMI; my impression was that some cables do offer individually shielded pairs, but I agree that it's not going to work with specs like you quote.

dhslichter commented 5 years ago

I got offer for XCZU25DR-2FFVG1517E. It is 8.6k $ at 2 pcs. So it looks quite good.

So this is basically $1100 per channel of DAC + ADC, right? This certainly seems tractable, especially since given the much reduced power consumption and the fact that everything is on the same silicon, we would just skip the RTM card and do the whole thing on an AMC, right?

gkasprow commented 5 years ago

QSFP and SAS offer much higher density than SFP. I can measure crosstalk some day - have in my lab all is needed to do that. But can do it easily up to 3GHz. To get access to better VNA I need to go to another lab. I use SAS connectors intensively in my designs and even internal connections are good enough to be used in lab environment. The SoC price is at 2 pieces. Once we buy more than 10, step pricing policy will work further reducing the price. Then after buying 50 or 100pcs we get to half this price or even less.

gkasprow commented 5 years ago

For WUT design we would probably need to use RF backplane to keep all SDR channels tight synchronisation. So the AMC would be very simple with MMC and some SFPs.

gkasprow commented 5 years ago

@dhslichter one can buy such cables with QSFP connectors.

hartytp commented 5 years ago

For WUT design we would probably need to use RF backplane to keep all SDR channels tight synchronisation.

Out of curiosity, why not use WR to high-quality distribute clocks and synchronisation signals via the AMC backplane? Much cheaper than the RF BP, doesn't tie you to specific racks and the performance can be excellent.

gkasprow commented 5 years ago

Since the specification is still in development, the final requirements are not yet available. We developed WR node that sits on NAT MCH, so this is one of the options. Here is paper that describes the idea.

hartytp commented 5 years ago

Cool! Well, do whatever works out best for you, but I think it would be interesting to see if we can replace the LLRFBP with WR in at least some systems.

gkasprow commented 5 years ago

We can try to fit it on AMC and design supplementary RTM with:

  1. RTM with
    • RFSOC
    • 2x AFE with ADC and DAC channels
    • clock distribution
    • CDR / White Rabbit oscillators
    • optional RF backplane connector
    • clock input SMA
sbourdeauducq commented 5 years ago

It's not compatible with the Sayma AFEs, but VPX modules can be very compact on the other hand: http://mish.co.jp/model5950_01.html

gkasprow commented 5 years ago

I know VPX and Space VPX - I am using it for another project: https://github.com/gkasprow/SpaceVPX_FMC_Carrier_3U/wiki https://github.com/gkasprow/SpaceVPX_FMC_Carrier_6U/wiki Space VPX is very similar to VPX - only protocols used differ. Instead of PCIe/SRIO we use SpaceWire. For our use case where we build multi-channel MIMO SDR, VPX is similar choice to MTCA. We would need to use 6U anyway. Both platforms are very similar. Main difference is connector. In our case the Space VPX SDR is designed to be very low power - max a few W during transmission. I build it for open source serious satellite platform (Hyper-sat.com) In our case I want to make RFSOC board attractive for at least 3 projects: SC qubit readout, very wide band MIMO SDR, ground station satellite high speed X-band link and possibly ion trappers. And since we use MTCA already for many projects, we will use this platform.

gkasprow commented 5 years ago

I sketched during long weekend schematics of RFSOC board. I combined schematics of Xilinx devkit, AFCZ and Sayma. I checked both AMC and RTM scenarios. It looks like AMC is better choice since I can use already designed RTM_SFP_QSFP board with optional support for RF backplane. AMC has more real estate than RTM. We don't need JTAG switch. Ultra Scale FPGA supports 1000base-X over ordinary LVDS pins so Ethernet PHY is not necessary since Xilinx PCS/PMA already do translation between EMIO GMII and 1000-baseX LVDS in PL. I can place Ethernet switch as an option to have MMC connected to Ethernet. We have working solution where we can remotely update FPGA FLASH over Ethernet + we have remote ZynQ + FPGA terminals over IP. I implemented clocking as on Xilinx devkit. Will be upgraded to WR-like solution. The board supports Sayma analog mezzanines. I placed 3 banks of DDR4 x64 SDRAM. The goal is to stream 2GS/s of 16bit data or 4GS/s of 8 bit data per channel + have one spare x64bit SDRAM for ZynQ. AMC scenario: obraz obraz

RTM scenario: obraz obraz

sbourdeauducq commented 5 years ago

Ultra Scale FPGA supports 1000base-X over ordinary LVDS pins so Ethernet PHY is not necessary since Xilinx PCS/PMA already do translation between EMIO GMII and 1000-baseX LVDS in PL.

How to take care of the code for this? How will it be maintained? The "native" Ultrascale I/O block that has to be used for this feature is poorly designed - even worse than the GTH - and the code of the corresponding Xilinx wizard should raise eyebrows, to say the least. (Among many other examples, they have a 2-BIT COUNTER written as a TRUTH TABLE, containing plenty of undergrad-level Verilog mistakes, and spanning dozens of dozens of lines of code).

Isn't there a PS core that can be used from the Zynq?

If we want Ethernet on pure fabric, maybe use one of the GTH + SFPs instead.

sbourdeauducq commented 5 years ago

I can place Ethernet switch as an option to have MMC connected to Ethernet. We have working solution where we can remotely update FPGA FLASH over Ethernet

Aren't Zynq devices able to reprogram themselves completely from the built-in ARM core already, both to flash and to SRAM?

gkasprow commented 5 years ago

How to take care of the code for this? How will it be maintained? The "native" Ultrascale I/O block that has to be used for this feature is poorly designed - even worse than the GTH - and the code of the corresponding Xilinx wizard should raise eyebrows, to say the least. (Among many other examples, they have a 2-BIT COUNTER written as a TRUTH TABLE, containing plenty of undergrad-level Verilog mistakes, and spanning dozens of dozens of lines of code).

This is one of the options that can be used for ones that want to have direct link to the Ethernet. I will use Marvell switch that acts like regular RGMII PHY from SoC PS side. However you don't have access from PL side to it and have to rely on hardened GEM (EMAC) This is maintained by Xilinx IPcore usually. It uses IO SERDES in oversampling mode (2x1.25GHz) to realise simple CDR and once the right clock phase is detected it works for the rest of the packet as ordinary SERDES. Starting from 7-series family, IO SERDES can be uses for synchronous SGMII operation, it needs external PHY that converts it to 1000-baseX. It could be far easier to pack GMII to SGMII than doing 1000base-X. I can also try to connect RGMII to PL side, but I'm not sure if have enough pins. The top priority are 3x 2.5Gbit 64bit DDR controllers.

We have 8 or 16 GTY transceivers that can be used for this purpose, but it is another job to port the design. I want all of them to be routed to RTM. I can route one of them as an option to PORT0 as well.

Isn't there a PS core that can be used from the Zynq?

there are also GTR transceivers that can be used from ZynQ, but they support only SGMII mode and need external PHY. So there is no real benefit using them. I prefer RGMII because I want to use all GTRs for PCI Express. I will need PCIe for communication in one application. As I wrote, this design will serve at least 3 different applications.

If we want Ethernet on pure fabric, maybe use one of the GTH + SFPs instead.

This chip has GTY transceivers. I designed RTM board with SFP and QSFP cages as well as access to RF backplane

Aren't Zynq devices able to reprogram themselves completely from the built-in ARM core already, both to flash and to SRAM?

They can do, but only from uboot. We found it difficult to provide remote maintenance this way. We have two FLASH chips connected using crossbar that let us swap the chips and choose which FLASH to boot. It is maintained by MMC. So one FLASH has basic recovery Linux system and another FLASH has production software that provides h.265 codec, video streaming, remote configuration, etc. We had long discussion with company that provides centralised control system for all boxes and we decided to implement additional Ethernet access to the MMC. Probably it won't be really useful in the lab. The system consists of over 500 ZynQ boxes distribute around the country so remote maintenance was a must and for some reasons dual FLASH approach was not sufficient.

gkasprow commented 5 years ago

I run out of SoC IO pins for AFE control. I want to use identical approach as on Sayma RTM - all clock chips control, AFE IO lines etc will be connected to small Artix chip. @sbourdeauducq would it make you happy if I used only 4 LVDS links to connect it with the main SoC? Two of them would be routed to clock inputs of SoC and Artix chip. If you need 6, I can try to provide them as well.