Closed marmeladapk closed 4 years ago
From @dhslichter on 2018-04-27 16:31
@gkasprow what are your feelings about using 3.3V LVCMOS from the FPGA, and then using SN65LVDS387 chips to translate to LVDS to drive the DACs? This is a 16-bit-wide chip, so you can fit an entire DAC on a single chip (channel to channel skew ~150 ps). If we did this, then we could put 3 DACs on a single FPGA bank, and then we could use e.g. the XC7K325 Kintex-7, run up to 18 channels of DAC off 6 HR banks. This leaves one HR bank open, plus all the HP banks for connecting to SDRAM. The LVDS translator chips are fairly large (about the same size as the DAC chips), but even if we only end up with ~12 channels of DAC I think it's reasonable. We could do this with an Artix as well to save substantial money, if desired, assuming we will meet timing and have enough FPGA resources.
Remember we would be aiming for an absolute maximum data rate of 150 MSPS (basically, we want this to run at the RTIO clock frequency, for simplicity). For our use cases, we'd probably pick 125 MHz or 100 MHz RTIO clocks.
From @gkasprow on 2018-04-27 17:22
@tprzywoz did Hyperlynx simulations with various voltage standards. And it seems that with some IO standards and passive splitters he managed to connect 3 or4 DACs to single IO pin in FPGA. The standards that were successfully verified are: HSTL_I_18_S ; HSTL_II_18_S ; SSTL18_II_S, BLVDS with 600Mbps data rate
the schematics
and example plots: HSTL_I_18_S:
SSTL18_II_S:
So it seems that with 6 HR banks we can run 24 DAC channels + we have 2 HP banks for 32bit DDR + one free HP bank
From @gkasprow on 2018-04-27 17:31
With current eye diagrams we can go much higher than 150MS/s per DAC...
From @gkasprow on 2018-04-27 17:45
I'm afraid of using this TI chip close to max spec and HR LVCMOS at 600Mbit. This may not work reliably.
From @gkasprow on 2018-04-27 20:18
Why do we want to use Kintex 7 and not Kintex US used in Saymas? I pay for them 900$. Another thing is PORT0 Ethernet. We can use same PHY as in Sayma to reuse most of the code developed for Sayma.
From @sbourdeauducq on 2018-04-28 01:36
Can the DAC go to 250Ms/s? There is a group at WIPM who's interested in that. As for reusing problematic Sayma hardware, it should be first debugged and well-proven on Sayma, which is not the case yet. The Ultrascale architecture also has the BUFGCE_DIV design flaw that makes it very difficult to implement things like SERDES TTL with deterministic latency, while doing the same thing in 7-series is straightforward. And its I/O logic is generally an awful design, where they crammed a lot of unwarranted complexity into one monolithic cell (like the transceivers) and slapped one of their typical idiotic wizards on top of it, to hide the mess from the user. There is a "backwards compatibility" mode, but still: the delays are difficult to use and the SERDES only supports ratios of 4 and 8. Again this is not a problem with 7-series, where there is little to criticize about the I/Os. Before using Ultrascale, we should get a FMC development board for the DAC, put it on a KCU105 or Sayma, and check carefully that everything works, including DAC synchronization. Otherwise it can be a mess like the DDR3.
From @gkasprow on 2018-04-28 08:04
The DAC can go, but the HR IO in Kintex 7 may have troubles running at 1GHz reliably with reasonable timing
From @gkasprow on 2018-04-28 08:07
If we go for 2 Artix 7 then:
From @sbourdeauducq on 2018-04-28 08:32
The DAC can go, but the HR IO in Kintex 7 may have troubles running at 1GHz reliably with reasonable timing
No, they are pretty much the same, except that Ultrascale has poor design and problems. In fact, to go back to the Kintex-7 performance level, you need to use the horrible "native" mode and its associated Xilinx wizard.
Ultrascale:
(Go figure why there is a minimum data rate specification. This thing is just insane.)
Kintex-7:
No insane debugging and ugly code to get the I/Os running.
From @gkasprow on 2018-04-28 08:49
@sbourdeauducq running interleaved DACs require generation of sequence of latches that need to be more precise than 1/2 of the basic data rate period. So that's why I'm slightly afraid of running it with 1.2GHz toggle rate the SEDRES can offer. We can use IO Delay but again, have to play with individual clock and latch phases. And the problem with DACs is that one don't really know if the data were correctly latched and what is the window because we don't have any form of eye scan. So that's why I wrote it would work with 150MHz because we can run the data at 600MHz safely. If we want to go much faster, then we add either another Kintex chip or leave the RTM and use 2 Artix 7. The cost penalty of AMC board would be negligible because we would compensate it with higher channel count. Take into account that it is precise DC design. I'm not sure if power hungry Kintex chip on board makes it easier. If we add the SFP cage for DRTIO, we essentially block the air flow for DACs which will make temperature management much worse. Concerning JESD - I'm not sure if we would reuse complete Sayma HDL design. It would need to be modified anyway. So what about using raw GTP transmission on basic protocol level ? Just use 16 channels running at 6Gbit/s with basic data integrity checking. I did it in many designs and it works just fine.
From @sbourdeauducq on 2018-04-28 08:57
Contrary to the Ultrascale one, the Kintex-7 IODELAY works correctly and without making a mess. Note that Artix-7 doesn't have output delays, it has input delays only, but it still has a MMCM that supports fine phase-shifting of the clock with good precision (1/56th of the VCO, which can be as high as 1.2GHz IIRC). If the PCB trace lengths are well matched, this should be sufficient. The Ultrascale MMCM is very similar and also usable. I used high-speed LVDS DACs before and some actually support some test pattern checks for the I/Os. But yes, even when using a sane FPGA this is not trivial and needs testing. Synchronization also, as those DACs contain an elastic buffer with non-deterministic latency.
From @sbourdeauducq on 2018-04-28 09:02
Take into account that it is precise DC design. I'm not sure if power hungry Kintex chip on board makes it easier.
Won't there be separate digital and analog supplies anyway?
From @gkasprow on 2018-04-28 09:11
That's why to limit possible issues with FPGA-DAC connectivity, I'd simplify it as much as possible and use 1:1 interface. We cannot use LVDS do to drive multiple 100Ohm loads, that's why only SSTL and similar interfaces work which @tprzywoz proved with simulaltions. This DAC does not have any form of data integrity checking. It has only internal test mux that can be used to verify timing alignment by comparing the timing of LVDS inputs one pair at a time through the TSTP/N pins. We can connect these test pins to the FPGA and do some sort of eye scan, but I'm not sure if this pays off. I prefer to add low cost Artix chip as an extender and implement trivial protocol over GTP.
From @gkasprow on 2018-04-28 09:14
There will be separated power supplies. I worry about temperature management. With simple board that contains only DACs and low power Artix chips working as stupid GTP-LVDS converters, we can provide uniform cooling to all DAC channels and analog buffers. If we integrate SFP, power-hungry Kintex and DDR, I'm not sure if we manage to achieve DC accuracy. I had issues long time ago with VME board where SFPs were used. The board area above SFP was practically not cooled because of blocked air flow.
From @sbourdeauducq on 2018-04-28 09:23
That's why to limit possible issues with FPGA-DAC connectivity, I'd simplify it as much as possible and use 1:1 interface.
That's definitely a good idea. How many DACs can be connected with the largest Kintex-7 package?
This DAC does not have any form of data integrity checking. It has only internal test mux that can be used to verify timing alignment by comparing the timing of LVDS inputs one pair at a time through the TSTP/N pins. We can connect these test pins to the FPGA and do some sort of eye scan, but I'm not sure if this pays.
I don't know what DAC datasheet you are reading, but I just picked DAC3171 at random and it can check that a looped sequence of 8 samples matches the contents of some user-programmable registers. That can be used to measure the error rate of the interface at startup. And we need to do an eye scan anyway, it won't "just work".
I prefer to add low cost Artix chip as an extender and implement trivial protocol over GTP.
No GTP use case is trivial. Did you think of e.g. deterministic latency?
Or just limit the number of channels per card. Trying to cram too much on one card is what caused many of the Sayma issues.
From @gkasprow on 2018-04-28 09:34
@sbourdeauducq we decided that only LTC2000 is applicable here due to DC accuracy. We need <LSB at 16bits. If we go at the speed far below the skew, why we need to make an eye-scan? GTP can have deterministic latency. It is used by WR which needs to be deterministic and there are tricks how to achieve it. Large Kintex7 would dominate the price of the board. With standard Kintex7 we can achieve 6 channels with 1:1 interface or12 channels with 1:2 multiplexing but we are sticked to HR outputs which are quite limiting here. Do HR IOs in Kintex have output delay? Moreover I have good prices for FPGAs used only in Sayma, AFC (A 200T) and AFCK (K 325T) :)
From @gkasprow on 2018-04-28 09:36
Each DAC consumes roughly 300mW at 250MHz + output stage power so there will be issues with cooling and maintaining their DC accuracy
From @sbourdeauducq on 2018-04-28 09:50
GTP can have deterministic latency. It is used by WR which needs to be deterministic and there are trick how to achieve it.
Right, there are a few approaches; and that's in DRTIO as well, but I would not call it trivial.
If we go at the speed far below the skew, why we need to make an eye-scan?
Quite simply, to make sure we're not hitting the setup/hold windows of the receiving FFs, which can happen at any data rate.
If the data delays are matched, phase-shifting the data clock sent to the DAC is equivalent to the ODELAYS.
From @gkasprow on 2018-04-28 09:52
@sbourdeauducq what are requirements of the group at WIPM ? For shutting @hartytp specified some here
From @gkasprow on 2018-04-28 09:54
Okey, so we do not really need Odelays to connect 1:1 DAC to Artix 7 chips and runt it even at 500MS/s. We can equalise traces easily with 1:1 connectivity. With multi-drop it is tricky but also possible. Cannot we run 16 copies of DRTIO over GTP ?:)
From @gkasprow on 2018-04-28 09:54
I mean to hook every DAC on dedicated DRTIO channel :)
From @sbourdeauducq on 2018-04-28 09:56
There are many things we can do, but that doesn't mean we should; having one large FPGA per card connected to DACs directly without any multiplexing/multidrop sounds like the most trouble-free solution.
From @sbourdeauducq on 2018-04-28 10:00
Some people are also interested in traditional AWG operation (i.e. with a raw sample buffer); then the memory bandwidth is also limiting how many channels you can use on one card. And no, the solution is not HMC, the solution is to put fewer channels per card and have more cards. Much easier.
From @gkasprow on 2018-04-28 10:04
Yes, I fully agree. One big FPGA saves a lof of pain. Long time ago we built a 256 channel DAQ system with 125MS/s sampling in every channel. We used over 20 biggest Artix chips. And nearly 50% of the resources were devoted just to communication between them :D It was custom system, fit in 2U but we had plenty of issues with thermal management and supply. Since then I use MTCA and sleep better :)
From @gkasprow on 2018-04-28 10:16
The question is if we need a communication between such AWG or shuttler channels ? If we implement:
What level of computing do we need? Take into account that resources we use in Kasli are not satisfactory and sooner than later we will have similar limitations here. That's why my initial idea was to leave UltraScale on Sayma for computing and communication (SFPs) and use Artix for DAC fanout on RTM. With this approach we get:
The only issue is communication channel between AMC and RTM.
From @gkasprow on 2018-04-28 10:20
I asked you about requirements because we can decided if we go with analogue mezzanines of without them. We can deliver standardized 1Vpp output levels and move filtering to another board, close to the experimentation area as it already agreed for shuttling. In this way we get rid of problematic mechanical an thermal issues.
From @sbourdeauducq on 2018-04-28 10:24
Definitely no RTM please. High channel counts per card are generally a headache, cf. Sayma. If you really insist, one KU040 as the "big" FPGA on the card is less bad than multi-FPGA/RTMs, pending careful validation on Sayma/KCU105 with the DAC on FMC, and the definitive end of the Sayma DDR3 saga.
From @gkasprow on 2018-04-28 10:26
what about the SFP?
From @sbourdeauducq on 2018-04-28 10:32
Assuming µTCA with integrated Ethernet and a backplane: One is enough IMHO, 2-3 nice to have. If it is causing thermal problems, scrap it.
From @sbourdeauducq on 2018-04-28 10:49
we can decided if we go with analogue mezzanines of without them.
As demonstrated by Sayma (expensive, fragile, unreliable SMP connectors, mechanical mounting difficulties, excessive density): analog mezzanines are a hassle that is best avoided.
From @gkasprow on 2018-04-28 10:51
How would you synchronize the boards without sfps? Via Metlino?
From @gkasprow on 2018-04-28 10:54
We can always use simple RTM with sfps when needed. The board already exists and is really simple. 4 layers, supply and cages. No mmc 28 kwi 2018 12:51 "Grzegorz Kasprowicz" kasprowg@gmail.com napisał(a):
How would you synchronize the boards without sfps? Via Metlino?
From @sbourdeauducq on 2018-04-28 10:56
Yes. We really just need one high-speed serial pair, and DAC clocks unless @hartytp et.al.'s fancy PLL lands (we probably need the DAC clock anyway for standalone non-DRTIO operation). SFPs are nice for connecting more boards and for operating outside µTCA, though for the latter we can always break out the AMC connector. Why is a RTM connector less problematic than a SFP cage? In addition to what I said about RTMs, I don't like the connector, which is expensive and fragile.
From @gkasprow on 2018-04-28 11:02
RTM connector does not block the air flow because there is no air flow at all. SFP must be installed close to the panel and since it is long, it block the flow. RTM connector is press-fit, so can be assembled at any time and be not mounted by default.
From @sbourdeauducq on 2018-04-30 10:41
µTCA power is still problematic: https://github.com/sinara-hw/sinara/issues/475#issuecomment-385358136
From @dhslichter on 2018-04-30 18:24
@gkasprow @sbourdeauducq to respond to you various points above:
I am very concerned about mission creep here. Our preferred design for Shuttler is one where the sample rate doesn't need to be above 125-150 MSPS. The application is for providing "fast" voltages to an ion trap for shuttling ions around, meaning that we will care about producing signals of up to ~few 10s of MHz at the most. We want to choose the sample rate to be high enough to be able to readily eliminate Nyquist images as desired. We also require the best DC accuracy that one can readily achieve, and the lowest noise that one can readily achieve. We also require deterministic latency, ~ns or better timing accuracy for DAC outputs.
These goals are often in direct conflict with designs that other groups would be interested in -- moving to higher sample frequencies to be able to generate RF tones, having very fat memory bus to run in traditional AWG mode (streaming samples), and so on. Part of what got us in trouble with Sayma is that we tried to be too many things to too many people, and so lots of additional features and complexity were added in to make something "general purpose" that ended up being painful.
I want to propose a hard divide between what I would call a "fast DC DAC" -- which Shuttler as originally envisioned is intended to be -- and an "inexpensive multichannel RF DAC", which seems to be the desire from some other interested parties (and is motivating the increased sample rates being discussed). I think it's very difficult to make a design that works well for both applications, given the stringent requirements for the "fast DC DAC" performance.
In effect, I think we should consider splitting off a separate design process for the "inexpensive multichannel RF DAC", which seems like effectively a cheaper, simpler, single-card, parallel-DAC Sayma. Give this its own code name and work on issues relevant to it separately. I fear that otherwise we will be trying to do too many different things with this current design under discussion.
From @dhslichter on 2018-04-30 19:27
Regarding the various points made above, but referencing the Shuttler design as originally envisioned, in other words a "fast DC DAC" design:
From @dhslichter on 2018-04-30 20:24
tl;dr - @gkasprow my suggestion with LVCMOS-LVDS transceivers is with NO multiplexing, so the data rates/frequencies involved are slow (150 Mbit max) and well within the spec of the chips. This method then would allow us to use the "standard" kintex-7 FPGA because we'll have enough output banks. What is your volume price for those FPGAs?
From @sbourdeauducq on 2018-05-01 02:33
Fair enough. I'm only surprised that the DAC is a 2.5GSPS part that will be used at 1/25 its rated speed, and that a sample rate limit much lower than 2.5GSPS will be enforced in hardware by the slow CMOS translators.
rather you are sending DAC data which is just single levels for each period (one transition every 6-10 ns, and generally the data are unchanging)
It doesn't matter if the data are generally unchanging; you don't want any corruption and glitches when they do change.
From @dhslichter on 2018-05-01 05:43
@sbourdeauducq yes, this would certainly be using this DAC at less than its rated capabilities! However, just because the DAC is capable of running much faster doesn't mean we have to run it that fast. Without interpolation, to run a lot faster we would also need a lot more FPGA resources to generate the waveforms, for example. If we want to make an AMC card which is basically a parallel-DAC version of Sayma (i.e. intended for generating RF outputs), it probably makes more sense to use fast DACs which have upconversion and/or fine-tuning NCO options onboard, to reduce the data rate that needs to be fed by the FPGA. If someone really wants a full-bandwidth 1+ GSPS DAC for AWG purposes (e.g. superconducting qubit folks), then you just have to run with fewer channels.
For fast ion transport, the issue is that most DACs which have the required DC precision and low drift/tempco only go up to a few MSPS, which is not fast enough for our needs. There is a big gap between <=1 MSPS and 500+ MSPS with basically no chips in the market.
It doesn't matter if the data are generally unchanging; you don't want any corruption and glitches when they do change.
You're right.
From @gkasprow on 2018-05-01 09:52
To convert 1.8V LVCMOS to LVDS you don't need LVDS translators. For low speed, resistor network works just as good as dedicated chip. And you don't have issues with propagation delay difference between chips.
From @dhslichter on 2018-05-02 16:40
Resistor network seems fine to me. There is an internal termination across the DAC input ports (90-145 ohm, 120 ohm nominal) which complicates the resistive translator slightly. But it seems like the FPGA output current on 16 mA will be sufficient to drive it. One could even use 1.5V LVCMOS to reduce power consumption (unless this means we need another supply rail, in which case it's probably not worth it).
The power draw is fairly high for this kind of resistive termination, though: 10s of mW per channel, times 17 channels per DAC. One can cut it in half by using a capacitor on the undriven LVDS input port to ground, which gives more differential voltage and then the bias resistors for the undriven LVDS port can be larger. Then you can end up with something more like ~10 mW per channel:
I don't think I can go much lower than this in power, though, and it requires using narrower traces (higher impedance) to keep the match. So in this sense one would have about ~200 mW of dissipation per DAC just from the LVDS conversion, even running at 1.5V LVCMOS. Just something to keep in mind for the power budget.
From @gkasprow on 2018-05-02 17:08
@dhslichter you don't need to make Thevenin termination. You can use source-sink regulator to generate 1/2Vcc and connect both termination resistor and negative LVDS input of the DAC to it.One would need one resistor in series with FPGA IO and second in parallel with DAC input to form 50R load. Then power consumption would be very low. With 1.8V supply, 0.9V reference and +/- 200mV LVDS input at 50Ohm, the series resistor would need to be roughly 220Ohm and power would be roughly 4mW per pin which gives roughly 60mW per DAC.
From @dhslichter on 2018-05-02 18:05
@gkasprow good thought. I think ideally one would have more like +/- 300 mV at the DAC inputs, to have some extra margin, so we need a smaller series resistance. Simulating this to give a good match at the output, the current draw is now 6.2 mA, power dissipation is now 5.5 mW per channel (93 mW per DAC). This assumes the 12 mA range for the 1.8 LVCMOS output, which has an output impedance of ~20 ohms.
From @gkasprow on 2018-05-02 18:25
One can slightly reduce the power consumption by utilising higher than 50Ohm track impedance. If the DAC LVDS common mode can go down to 0.75V, we can use LVCMOS15 standard to spare a few mW.
From @gkasprow on 2018-05-02 18:29
Another question touches FPGA selection. Artix200T, Kintex 325T or Kintex US 040? What computational and SDRAM resources do you need?
From @dhslichter on 2018-05-02 18:40
From the spec sheet, we should definitely be able to run at 0.75V common mode -- but then we need another regulator for that, so it might not be worth the power savings if they are marginal.
Computational and SDRAM resource question I don't know the answer to off the top of my head; @jordens can you comment on the requirements for one channel of phaser? My sense is that a 32-bit SDRAM will be sufficient. However, we have to consider whether or not this FPGA is going to be running a mor1kx processor, or if it can be a "dumb" DRTIO peripheral that just receives register updates and triggering commands but doesn't have its own soft processor onboard. @sbourdeauducq can you comment? If there is a processor, we would probably want two separate SDRAM banks, one for the processor and one for storing waveform data, so that we don't have to deal with bus contention issues with the processor.
I tend to prefer the Kintex 325T, but I realize it is a lot more expensive than the Artix. However, I think the ability to meet timing with our builds (especially if we want to be able to go up to 150 MHz RTIO clock) seems better on Kintex than Artix. I don't see any reason to prefer the Kintex Ultrascale over the Kintex-7.
From @hartytp on 2018-05-02 18:45
What max data rate do you think you can get with that resistor scheme? I'm not pushing for anything, purely curious here.
From @gkasprow on 2018-05-02 20:00
The IO toggling rate is main limitation here. Since the traces are matched no SI issues will be limiting factor.
From @dhslichter on 2018-05-02 20:02
I think you end up being limited by the LVCMOS rise/fall times, which are on the order of 300-400 ps 10-90% for the Kintex 7 for 1.5V LVCMOS on the highest current range (16 mA), based on the IBIS data. Depending on the jitter performance, it seems like one ought to be able to set things up to run up to ~500 MSPS, perhaps? Certainly running at 250 MSPS should be achievable. Of course, you will need more FPGA resources to spit out the data at these higher sample rates.
One thing that could be considered, that could be implemented after the fact in gateware, would be to roll your own "mix mode" interpolation. You would run the DAC at a given clock (say 500 MHz), generate data at half that rate (here it would be 250 MSPS) inside the FPGA, and then double the frequency of the generated data for the output by double-outputting each calculated data point, switching polarity for the second copy. Thus if your calculated data values are a series of values {v0, v1, v2, v3} at 250 MSPS, you output {v0, -v0, v1, -v1, v2, -v2, v3, -v3,...} at 500 MSPS. Doing this enhances the power in the second Nyquist zone and you can get tones up close to 500 MHz. With two's complement outputs to the DAC, this negation is achieved by just toggling all output bits, so it would be very simple to implement this in gateware. In this way, one could get AOM-frequency RF out of this system even if you are running at, say 400 or 500 MSPS, and only needing to generate data samples at half that rate.
Obviously you can also implement proper digital interpolation/upconversion blocks in the FPGA as well, but this is simple and less resource-intensive.
From @hartytp on 2017-08-16 05:24
I've started sketching out requirements for a fast DAC to be used for ion shuttling/splitting. It's on the Wiki.
All comments welcome...