[RFC] New AMC: Shuttler (high-speed multi-channel DAC)

marmeladapk commented 6 years ago

From @hartytp on 2017-08-16 05:24

I've started sketching out requirements for a fast DAC to be used for ion shuttling/splitting. It's on the Wiki.

All comments welcome...

marmeladapk commented 6 years ago

From @dhslichter on 2018-02-01 19:16

I would much prefer to have an SDRAM on the board to enable the playback of arbitrary precomputed waveforms, or at the very least to extend the length/number of branches of spline-generated waveforms (relative to the PDQ system, for example), or to store some set of funky basis functions which can be scaled and summed on the FPGA to make the output waveforms.

This board will definitely be expensive (this is not news), and will be aimed at people trying to do specific tasks where they need all the bandwidth; slow voltages should still come from Zotino or equivalent if one is budget-sensitive.

@gkasprow I don't really want to do hacks where we try to drive multiple DACs from a single FPGA pin; let's just do it the boring way and we will eat the cost of the second FPGA. My recollection is that we can have two FPGAs to feed 24 DACs on an RTM card; if we want SDRAM on the RTM card as well we will have to have a think about whether there will be enough pins. @hartytp we are also back at the discussion of where the waveforms should be computed, as in https://github.com/m-labs/sinara/issues/253#issuecomment-353404428, and I still prefer to have the actual computations done on the RTM FPGAs (which means we can use a thin pipe AMC/RTM) rather than trying to do the roll-your-own JESD with calculations done on the AMC. We basically implement some kind of DRTIO passthrough for the AMC so that the FPGAs on the RTM card have an understanding of time and receive commands etc like any other DRTIO peripheral. This is basically exactly the functionality that a Metlino would be providing for a crate full of Saymas via the AMC backplane, for example.

marmeladapk commented 6 years ago

From @gkasprow on 2018-02-01 19:49

One FPGA bank has 24 diff pairs. For each DAC we need at least 18 or 19 pairs. And it's not the best idea to split DAC between 2 banks.

So we we can fit one DAC per FPGA bank. With the biggest Artix FPGA we would be able to connect up to 10 DAC channels providing that no SDRAM is present. To support 32it DDR3 we need 2 banks, for 64bit DDR we need 3 banks. So with 64bit DDR we have only 7 banks. The only way to increase the DAC count is connection of DACs in parallel. Luckily they are rated at 2gbit/s so it's relatively easy to run them in interleaving mode. With IOSERDES utilisation we can easily run 2 or 4 of them in parallel per FPGA bank. The only challenge is signal integrity but Hyperlynx simulations have shown that it is feasible. The AFC design which I mentioned has 32bit DDR and 8 banks available. So we could easily run 16 DACchannels with x2 interleaving.

Such configuration should fit on AMC board - this is AFC design template where I removed complex management and programmable clock distribution (White Rabbit node) and placed 16 DAC channels. We can leave rest of AFC intact and replace FMC LVDS links with DACs. The only drawback is 32 bit DDR3. We can use FMC connectors and FMC mechanics to implement 8 channel AFEs.

Maximum what we could squeeze out of this concept are 32 channels and 16channel AFE mezzanines. But could be difficult from thermal point of view and it's better to split it between AMC and RTM because it would be difficult to fit 32 channels of processing into relatively small 200T Artix.

obraz .

marmeladapk commented 6 years ago

From @hartytp on 2018-02-01 19:56

@dhslichter

Remind me why you don't want a roll-your-own JSED bus?
If you want to put the FPGAs with the CORDICs + RAM, Si5324, etc etc on the RTM then what's the point of the AMC? AFAICT, Greg's point about things not fitting on the AMC is due to the fact that the AMC has RAM etc on it. If that's duplicated wholesale on the RTM then what is the AMC for? i.e. AFAICT, the two options are really: either go for the roll your own JESD approach; or, put everything on the AMC.

marmeladapk commented 6 years ago

From @gkasprow on 2018-02-01 19:58

In RTM scenario we have several options. Either treat Artix FPGA as deserializer and run 10 banks with x2 interleaving and feed 20 DACs. Another option is to use 2 Artix chips and connect 20 DACs directly We can also use 2 Artix chips with 2x interleaving with 32 DAC channels and 2x 32 bit SDRAM banks or control signals for AFE boards. We don't have to implement JESD - just push raw serial data or utilise AURORA protocol that works out of the box (tested).

marmeladapk commented 6 years ago

From @hartytp on 2018-02-01 20:00

NB My guess is that the AMC FPGA has enough room for the CORDICS etc to drive 32 DAC channels although this would need verifying. Thinking behind this is that Sayma has 16 baseband CORDICS + 8 parralelized CORDICS (much more resource intensive) so I expect that there shouldn't be an FPGA resource issue with driving everything from the AMC FPGA...

marmeladapk commented 6 years ago

From @gkasprow on 2018-02-01 20:01

Let's first estimate how many resources do we need to implement CORDIC. How many of them we would fit on Sayma AMC and how many on Artix FPGA. Take into account that DAC connectivity will consume essential part of Artix chip. Once I connected 64 channel ADCs running at 125MHz and properly done data reception (fine delay tuning) part consumed half of the chip. I used the same Artix chip.

marmeladapk commented 6 years ago

From @hartytp on 2018-02-01 20:02

@gkasprow good points! The more I think about this, the more I like the approach of having a couple of cheap dumb Artix 7 FPGAs on the RTM to just deserialise the data from the AMC. That way, we leverage all the work done on Sayma AMC as well as the bulk-buying discounts.

marmeladapk commented 6 years ago

From @gkasprow on 2018-02-01 20:03

XCKU has 530k slices, 1920 DSP blocks Artix 200T has 215k slices and 740 DSP blocks

marmeladapk commented 6 years ago

From @gkasprow on 2018-02-01 20:06

So the simplest possible option is one Artix 7 chip workin as deserializer + 20 DAC channels. 2:1 interleaving would be trivial. Or 40 channels and 2 Artix chips. We could reuse Sayma RTM mezzanine formats, panels and mechanics.

marmeladapk commented 6 years ago

From @gkasprow on 2018-02-01 20:16

When needed, we can easily increase FPGA resources on AMC by mounting bigger FPGA - XCKU060 or XCKU095. Power supply has enough margin to manage it.

marmeladapk commented 6 years ago

From @dhslichter on 2018-02-01 21:52

@hartytp I don't think there is any argument from me about using Sayma AMC in front -- I thought we settled on that already. The question of streaming lots of data from the AMC card and just using the RTM FPGAs as deserializers is a bit more open. Several comments then:

What resources are available on the current Sayma AMC FPGA relative to what one would need to run the DAC channels? I'd like the DAC sample clock to the be the same as the RTIO clock (100-150 MHz), so we shouldn't need to parallelize CORDICs or anything. For each channel, I would like to duplicate the Sayma/"phaser" spec for two tones with spline amplitude and frequency/phase modulation plus a "dc" spline; really I would like to have more tones per channel if the resources are available, with four tones being a good goal.
I don't want to have a "bigger FPGA" variant of the Sayma AMC, let's just keep this the same and not make more board variants.
I want to be able to use the FMC connector on the AMC cards for monitor ADCs or additional slow DAC channels, and leave enough resources on the AMC FPGA that I can drive this hardware.
I don't like running gigabit data lines if I can help it; they are nastier to route, nastier to debug, require use of gigabit transceivers and attendant costs for gateware development, etc. Implementing a single DRTIO link to the RTM card seems much simpler and the work can be reused for Metlino (making a DRTIO repeater node).
I think the timing is more subtle/tricky to figure out if we are sending data and also synchronization pulses from the AMC card to get everything to trigger at the right time on the RTM. If those FPGAs are DRTIO devices, they just have their timing (this is a solved problem) and then output when they want to.
I think 20 DACs per RTM is probably about as much as one can reasonably fit in terms of power budget, I certainly wouldn't do 40 DACs. I would rather keep the power dissipation a bit lower and improve stability than try to cram as many channels as humanly possible. I think having two AFE cards is reasonable, but we could just stick with one as well. We need to be able to route all of the analog outputs to the AFE connectors from the DACs in a low-crosstalk way, so that also puts some constraints on the physical placement of the DACs and thus the DAC count.

marmeladapk commented 6 years ago

From @dhslichter on 2018-02-01 21:54

@gkasprow does this AURORA protocol have gateware written already that could be integrated into ARTIQ? Or would we be writing new gateware?

marmeladapk commented 6 years ago

From @gkasprow on 2018-02-01 22:19

This is xilinx IP. I'm not sure if it can be easily integrated to ARTIQ. It is frame-based protocol so adds latency of the frame length.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-02-02 02:38

Can we PLEASE postpone this discussion until Sayma works correctly? If you have that much bandwidth, there are still plenty of Sayma bugs that can use your help.

marmeladapk commented 6 years ago

From @jbqubit on 2018-02-02 03:19

I echo @sbourdeauducq's comment here.

marmeladapk commented 6 years ago

From @dhslichter on 2018-02-05 17:51

Oh my god yes @sbourdeauducq . There will be plenty of time to discuss Shuttler once other things are shaken out.

marmeladapk commented 6 years ago

From @tprzywoz on 2018-03-05 09:48

Referring to last comments: We can use 2 of the biggest Artix-7 FPGA. That gives 20 IO banks. 4 banks will drive 32 bit DDR3 and rest will drive 16 channels of DACs (without interleaving as we suggested earlier) Then as gkasprow mentioned we can use FMC connectors to implement 8 channel AFEs with FMC mechanics or make PCB longer. One FPGA has 16 gigabyte links. To ensure compatibility with Sayma AMC each will consume 8 links to receive data. Other 8 stay free so they can be used for example to enable fast data transfer between FPGAs.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-21 09:41

Thinking out loud here, but I was wondering if the correct approach for shuttler would be to implement a JESD204B Rx stack on the RTM FPGA(s). Basically, this is just the previously discussed proposal, but noting that we don't need to roll our own JESD204B when we can just complete the Misoc JESD204B implementation and stick a Rx stack on the RTM FPGAs, which allows the AMC FPGA to interface with the LVDS DACs on the RTM using the current AMC gateware. This has side benefits too, like making it easy to add JESD204B ADCs to Sinara in the future.

The plan would then be:

Don't make any changes to the Sayma AMC hardware
Don't make any major changes to the Sayma AMC gateware, just some minor bits of reconfiguring so that it runs with more channels (something like 16) of JESD204B DACs. Give each channel 1-2 baseband CORDICS with no DUC -- that will use fewer FPGA resources than the current Sayma design and so should easily fit.
On the RTM FPGA, route the gigabit links to 1-2 Artix-7 FPGAs. Each FPGA has a JESD Rx stack and LVDS links to the DACs. The RTM FPGA then effectively makes the LVDS DACs operate as JESD DACs as far as the AMC FPGA is concerned.

Anyway, it's obviously premature to talk about this too much before Sayma works.

marmeladapk commented 6 years ago

From @dhslichter on 2018-04-23 18:55

@hartytp this seems like a reasonable way to do things. One would still need to do the work for frame alignment and synchronization so that the Artix FPGA is sending data out to the DACs at the appropriate time, with consistency from boot-up to boot-up. Also, has the JESD Rx stack been implemented in MiSoC yet? There is a lot of yak shaving here.

My thought would be to clock the RTM FPGAs at the RTIO frequency (gotta make sure they meet timing here!) and do the data generation onboard, which then saves all the serializing/deserializing/handshaking/synchronization business, which can be quite complex and buggy. One could consider putting an SFP on the Shuttler RTM card to enable it to receive DRTIO fiber and run standalone, as risk mitigation. Then you are basically treating the two FPGAs on the RTM as their own DRTIO peripherals.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-23 19:27

Also, has the JESD Rx stack been implemented in MiSoC yet? There is a lot of yak shaving here.

Yes, the Rx stack needs to be implemented. However, all the new development would be on the RTM FPGAs, rather than the AMC. That makes compile times/debugging hugely easier, and avoids the quirks of Ultrascale FPGAs.

One could consider putting an SFP on the Shuttler RTM card to enable it to receive DRTIO fiber and run standalone, as risk mitigation. Then you are basically treating the two FPGAs on the RTM as their own DRTIO peripherals.

So, going around in circles a bit, if you do that then what is the point of having an RTM? If the RTM has the FPGAs that do all data generation, DRTIO, etc etc as well as all needed supplies etc, then what is the AMC needed for? Why not just make the whole thing an AMC?

AFAICT, the main justification for an AMC is that you put the big FPGA that does all the heavy lifting/data generation on it, and have only simple FPGAs (in our case, just fancy SERDESs) on the RTM.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-23 19:42

Putting it another way, the two paths that seem most sensible to me are:

Make shuttler a relatively simple AMC (no RTM) with an Artix FPGA and some LVDS DACs running at 125MSPS or so.
Make shuttler an RTM, which uses Sayma AMC. Do all the data generation on the AMC, using a reconfigured version of Sayma's gateware (no major modifications to gw). The RTM FPGAs act as some form of Serdes (e.g. using a JESD204B Rx stack).

What I don't get is putting SFPs and DRTIO FPGAs that generate huge amounts of data on the RTM. If you're going to do that then what is the justification for having all the extra complexity of two boards?

marmeladapk commented 6 years ago

From @dhslichter on 2018-04-23 23:48

What I don't get is putting SFPs and DRTIO FPGAs that generate huge amounts of data on the RTM. If you're going to do that then what is the justification for having all the extra complexity of two boards?

If you recall, my goal with Shuttler initially was not to have an AMC card at all, just a dummy that routes the backplane connections through to the RTM. I still don't like having an AMC card. I'd like it to be a standalone RTM card and avoid a lot of the hassles of AMC/RTM communication via gigabit transceivers. We also previously discussed the alternative of having a Sayma AMC card, which is free to use for other purposes (e.g. via it's own FMC), just act as a DRTIO passthrough from the front.

Based on everything that has transpired with Sayma, it's pretty clear to me that there are a ton of bugs and complexities associated with:

uTCA and its plumbing for getting AMC cards to run
gigabit transceivers on various FPGA families
rolling your own JESD

I basically want to have Shuttler be a better-engineered version of the PDQ card that can live in a nice crate, scalably. Perhaps Shuttler would be better off as an EEM module rather than in a uTCA crate, if we can run fiber to it and use it as a DRTIO peripheral. There is less physical space on an EEM card so you can't have as many channels, thus RTM appeals. But basically my goal is to get as far away from the various layers of protocols as possible, for risk mitigation. To me, sending data from the AMC to the RTM over gigabit transceivers and having the receiving FPGAs act as SERDES seems to be substantially higher risk than just having the waveforms generated and output in parallel buses right here on the RTM card.

If I am wrong about these things, please go ahead and let me know. The alternative of making Shuttler be an AMC card based off the current Sayma AMC, for example (replacing the FMC lines etc with lines to DACs) could work, but I don't think we'll be able to get as many channels of DAC on the board due to pin count restrictions etc. @gkasprow has a hack for doubling up DACs on an LVDS line running at twice the DAC speed but I haven't really been convinced about how robust that is.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-24 00:58

If the RTM has the FPGAs that do all data generation, DRTIO, etc etc as well as all needed supplies etc, then what is the AMC needed for? Why not just make the whole thing an AMC?

Perhaps Shuttler would be better off as an EEM module rather than in a uTCA crate, if we can run fiber to it and use it as a DRTIO peripheral.

Yes, what about an Eurocard? Then we can do away with MMC/IPMI garbage and expensive µTCA components as well, and make it more easily connected to Kasli. This is also a request I get when people coming from Kasli hear about Sayma. They might even contribute to funding.

As for transceiver problems, 1000BASE-X Ethernet (Kasli) and DRTIO (Kasli, Sayma) are quite reliable even though the transceiver design itself is unwieldy and an insult to engineering; the JESD problems are likely somewhere else.

Yes, Ultrascale FPGAs ought to be avoided (long compilation times, poor and bug-prone BUFGCE_DIV, IOSERDES and IODELAY design).

marmeladapk commented 6 years ago

From @dhslichter on 2018-04-24 01:38

I think it's well worth considering whether we could do a Eurocard format, especially if we can use a daisy-chain topology with fibers. At least from the NIST standpoint, we'd like to be able to keep running with our current KC705 hardware and add in Shuttler boards as a peripheral via DRTIO over fiber.

Potential concerns with a Eurocard format (things we would have to think about, but none are show-stoppers AFAICT):

power dissipation: the DACs will probably dissipate about 400-500 mW per channel, plus one would have a high pin count FPGA on the board to contend with as well, similar to Kasli in power dissipation most likely. We could use the Kasli FPGA perhaps but would probably want a better speed grade, need to be sure we can meet timing. Need to have good thermal engineering in a eurocard crate to do this. Also would need a beefy power supply for the crate. Not sure what the power dissipation rating for a RatioPAC Air is, for example.
space efficiency: depending on the size of the card, we might not have enough room to put a whole lot of channels. Might want/have to make it deeper than the standard 160 mm for example. It would be nice for price and space reasons to have more channels on a board if possible.
daughtercard space: need to have a large enough daughtercard to put analog stages on the output, either high-gain amplifiers (a further strain on the power budget), or low-noise differential buffers. I'd like to make the mechanics easy and make this a reverse-FMC with a straightforward front panel connector. This might force the FMC to be pushed farther back on the main board and thus crowd the FPGA? All things that can/should be worked out, but just a consideration.
synchronization: I understand work is ongoing to improve this, but I think that we would plan to run these at approx 125-150 MSPS (whatever the RTIO clock is, for simplicity), so we'd want the timing jitter from card to card to be well below this, and to have deterministic latencies in the system from power-up to power-up. This is really more of a DRTIO issue than anything else, and I know that people are working to solve it as best as possible.

marmeladapk commented 6 years ago

From @dhslichter on 2018-04-24 01:51

Certainly for me the cost of uTCA crates is a turn-off, and the level of yak-shaving with MMC etc is clearly a hassle, although it's not clear to me if once we solve that it will stay solved or will keep rearing its head.

For the Eurocard, I would hope to get 8-10 DACs plus one FPGA onto a card. This should be achievable but will be a tight squeeze I think unless we make the board longer than 160 mm.

I would prefer not to have any EEM on a design like this, just make it a DRTIO device at the same level as a Kasli, for example.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-24 02:15

especially if we can use a daisy-chain topology with fibers.

Or use this device (from a cursory look it seems straightforward to port ARTIQ to it): wrswitch Daisy chains have high latency.

We can also do DRTIO over EEM cables, and support both options on the Shuttler device. Not using the transceivers also results in lower latency.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-24 02:28

For the Eurocard, I would hope to get 8-10 DACs plus one FPGA onto a card. This should be achievable but will be a tight squeeze I think unless we make the board longer than 160 mm.

Why cram the board so much instead of adding another crate on top?

You'll need space for the connectors anyway. The SMA density on the Sayma RTM front panels is unwieldy (from an user perspective), and requires expensive and time-wasting custom mechnical parts be produced (from a developer perspective).

The connector density on the 8-channel SMA-TTL is already at the limit, and already led to compromises being made (https://github.com/sinara-hw/sinara/issues/522).

Having small and relatively inexpensive boards also lowers the barrier to entry for groups who only need a few channels, or a limited set of features.

I also recommend keeping it a monolithic design instead of adding modules that can become a hassle. See Sayma: its fragile, unreliable and expensive SMP connectors and the mechanical difficulties of mounting the RF daughtercards. Amplifiers etc. can go on a separate piece of hardware if needed, just like they currently do with the existing DDS hardware.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-24 02:42

synchronization: I understand work is ongoing to improve this, but I think that we would plan to run these at approx 125-150 MSPS (whatever the RTIO clock is, for simplicity), so we'd want the timing jitter from card to card to be well below this, and to have deterministic latencies in the system from power-up to power-up. This is really more of a DRTIO issue than anything else, and I know that people are working to solve it as best as possible.

Let's solve this once and for all, this is an interesting problem unlike dealing with µTCA trash, and WR low-jitter are getting ~100fs (IIRC, with the Si5324 which is in hindsight not the best choice, we are at ~90ps across board restarts now, and ~10ps without restarts). We should be able to do it as well, e.g. just stand on the shoulders of giants :)

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 08:20

If you recall, my goal with Shuttler initially was not to have an AMC card at all, just a dummy that routes the backplane connections through to the RTM. I still don't like having an AMC card. I'd like it to be a standalone RTM card and avoid a lot of the hassles of AMC/RTM communication via gigabit transceivers. We also previously discussed the alternative of having a Sayma AMC card, which is free to use for other purposes (e.g. via it's own FMC), just act as a DRTIO passthrough from the front.

@dhslichter Firstly AMC v RTM:

Obvious point, but physically an AMC is esentially the same size as an RTM. Also, there is no problem having a very simple/stripped down AMC with DACs, but none of the MMC etc. If you keep the functionality of the AMC to a bare minimum, then an AMC with DACs is essentially the exact same board as an RTM with DACs. So, I don't see what the benefit of putting everything on an RTM is.

Sticking everything on an AMC has some benefits over sticking everything on an RTM:

You don't need a "dummy" AMC to connect the RTM to the AMC back plane to provide power and optionally DRTIO links to Metlino over the AMC BP
There are many more options for uTCA chassis with only AMCs than with AMCs + RTMs. The AMC only chassis also tend to be cheaper and have shorter lead times.

In other words, if your only motivation for using an RTM is to not have to have MMC etc on the same board as your DACs then the simple answer is just to scrap all that -- the AMC will work perfectly well without it.

(AMC + RTM) v AMC or RTM

The only reason for using an RTM is if you want to use it as well as a full (i.e. not "dummy") AMC. Then you get to offload a bunch of stuff onto the AMC, leaving more room on the RTM for DACs. If you put everything on one board (either AMC or RTM) then the space left for DACs gets very limited.

In particular:

You're starting to talk about putting SPFs on the RTM. A couple (for daisy chaining) of SPFs plus a SMA clock input is already starting to take up some serious board space and room on the FP.
The FPGA(s) that do the data generation will end up being relatively large and burning off a lot of heat compared with a relatively simple SERDEs-type design (you were talking about 16 channels, each with 4 CORDICs running at 125MHz, that's a lot of logic). This means larger SMPSs, bigger heat sinks, etc etc. Again, if you put everything on the same board, this eats up a lot of space that could be used for DACs and contributes to thermal management issues.
If you want a nice wide RAM bus to stream data to the DACs (which is something you pushed for on Sayma) then, again, that takes up space.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 08:32

@dhslichter re your general points about uTCA:

Based on everything that has transpired with Sayma, it's pretty clear to me that there are a ton of bugs and complexities associated with:

uTCA and its plumbing for getting AMC cards to run

gigabit transceivers on various FPGA families

rolling your own JESD I basically want to have Shuttler be a better-engineered version of the PDQ card that can live in a nice crate, scalably. Perhaps Shuttler would be better off as an EEM module rather than in a uTCA crate, if we can run fiber to it and use it as a DRTIO peripheral. There is less physical space on an EEM card so you can't have as many channels, thus RTM appeals. But basically my goal is to get as far away from the various layers of protocols as possible, for risk mitigation. To me, sending data from the AMC to the RTM over gigabit transceivers and having the receiving FPGAs act as SERDES seems to be substantially higher risk than just having the waveforms generated and output in parallel buses right here on the RTM card.

If I am wrong about these things, please go ahead and let me know. The alternative of making Shuttler be an AMC card based off the current Sayma AMC, for example (replacing the FMC lines etc with lines to DACs) could work, but I don't think we'll be able to get as many channels of DAC on the board due to pin count restrictions etc. @gkasprow has a hack for doubling up DACs on an LVDS line running at twice the DAC speed but I haven't really been convinced about how robust that is.

I guess you haven't been following the developments/debugging on Sayma closely, but I don't buy any of these claims.

JESD204B:

Firstly, the M-Labs JESD stack worked essentially first time AFAICT. Can anyone remember any major time consuming issues with it that we've found with it? I can't.

There was a simple hardware issue that took a while to debug (we'd forgotten to connect the termination pin on the DACs up) but that wasn't really JESD-specific and was just a simple oversight.

uTCA "bugs"

So, it's true that Sayma has been held up for quite a while by issues related to uTCA things like MMC. But, that's all optional -- one doesn't need any of that to get a basic AMC up and running in a uTCA rack.

The "plumbing" required to get an AMC to receive power and do communicate with the MCH for DRTIO is really minimal and easy to get going. I'd argue strongly that the risks that you've acknowledged in using Eurocards, such as having to reinvent the wheel on thermal management, are much worse than the risks in a minimal AMC.

GTX

Again, I don't buy this point at all.

Firstly, if you want DRTIO/SPF then you have gigabit transceivers. So, your proposed design doesn't eliminate them
Secondly, so far none of the issues we've had on Sayma so far have been to do with them -- we've spend the most time so far on the SDRAM!

Certainly for me the cost of uTCA crates is a turn-off

Really? AFAICT, if you go for an AMC-only crate, you have a lot of options at a pretty good value.

Before you use this as the basis for a major design decision, I'd encourage you to put together a parts list/cost for a EuroCrate with sufficient cooling power and power supply to run the kind of system you're talking about (and, include the labour costs if you're buying a tonne of separate parts that need assembly). Compare that to a cheap AMC crate.

If you get into exotic things like LLRFBPs then the uTCA costs explode.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 08:34

If I am wrong about these things, please go ahead and let me know. The alternative of making Shuttler be an AMC card based off the current Sayma AMC, for example (replacing the FMC lines etc with lines to DACs) could work, but I don't think we'll be able to get as many channels of DAC on the board due to pin count restrictions etc. @gkasprow has a hack for doubling up DACs on an LVDS line running at twice the DAC speed but I haven't really been convinced about how robust that is.

To be clear: if we made shuttler an AMC then I would definitely not recommend basing it on Sayma. Just start again from scratch and make the thing as simple as possible -- take everything you're thinking of putting on an RTM and stick it on an AMC instead.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 08:41

Or use this device (from a cursory look it seems straightforward to port ARTIQ to it):

So what you're talking about here is reinventing uTCA, with the BP replaced with a tonne of fibres and the power supply replaced with a stack of wall-warts

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 08:44

Anyway, tl;dr on all of this. My feeling is that:

If/when we get to the point that Sayma is working really well and we're happy with its reliability then it's worth considering making Shuttler an RTM that piggybacks of Sayma. The SERDES route allows us to use the gateware on Sayma without any major modifications to the hardware or gateware (just reconfiguration of the gateware) which should minimise the changes of nasty bugs, and cuts down development costs.

If when we come to develop Shuttler we still have concerns about Sayma, I'd vote to go for a AMC-only design, with all the unnecessary uTCA stuff stripped off.

Having said all that, the decision will ultimately be made according the the preferences/prejudices of the person who funds it!

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 08:48

Let's solve this once and for all, this is an interesting problem unlike dealing with µTCA trash, and WR low-jitter are getting ~100fs (IIRC, with the Si5324 which is in hindsight not the best choice, we are at ~90ps across board restarts now, and ~10ps without restarts). We should be able to do it as well, e.g. just stand on the shoulders of giants :)

Oh, and PS @sbourdeauducq our DCXO eval boards arrived and @WeiDaZhang is starting to code up a WR implementation using them. Will let you know when we have something. If we get it up and running, we'd definitely appreciate some help porting it to ARTIQ if you're interested.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 09:05

One final comment on the EuroCard v uTCA thing: this doesn't need to be a single design for everyone.

For groups with simple experiments who only want a handful of channels, the EuroCard format is almost certainly the way to go. Build a simple one with 4-8 channels (either SMA or something like RJ45) with a fixed analog front-end (no mezzanines) and an SPF on it.

For those of us who are thinking of larger-scale experiments (e.g. involving HOA2) where we need hundreds of channels, uTCA is almost certainly the way to go. In that case, we'd use higher-density connectors like SCIS to avoid having huge stacks of cables.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 09:30

Okay, final final point from me ;)

@dhslichter don't get carried away blaming all the issues with Sayma on uTCA. The reasons the EEM stuff worked so much more easily than Sayma is:

(a) they are much simpler designs -- all SPI and no 1GSPS+ data converters. It's important not to underestimate how much easier this makes things (b) the EEMs were designed with a very strict policy of keeping things as simple as possible, and generally with one specific application per card. In contrast, Sayma was seen as a design that should solve all of life's problems in a single board. This lead to very flexible clocking schemes, high channel counts, pluggable AFEs, etc. If we'd stripped it all down to the minimum, it would probably be working by now (although, that doesn't mean that it won't be worth it in the long run -- the jury is still out on that).

I do think that if you try to cram a bunch of fast DAC channels onto a EuroCard with pluggable AFEs etc then you're going to start running into similar issues to Sayma -- only worse, since you now don't have all the thermal and power management taken care of you by the well-designed uTCA rack.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-24 10:49

no 1GSPS+ data converters

The KC705 phaser has the same data converters as Sayma and nowhere near the insane amount of bugs.

Firstly, the M-Labs JESD stack worked essentially first time AFAICT. Can anyone remember any major time consuming issues with it that we've found with it? I can't.

It's coming: https://github.com/m-labs/artiq/issues/861

As for SDRAM/GTH, see my comments about the Ultrascale IOSERDES and IODELAY.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 11:06

The KC705 phaser has the same data converters as Sayma and nowhere near the insane amount of bugs.

Sure.

Note that none of the bugs are on Sayma due to anything intrinsic to uTCA, but rather due to particular design decisions that were made. e.g. you can do uTCA without MMC; you can do uTCA without ultra-scale FPGAs.

But, yes, it turns out that if you use eval boards that have been very well tested over a long period of time then you get fewer hardware bugs than if you design your own hardware from scratch. This isn't surprising.

As for SDRAM/GTH, see my comments about the Ultrascale IOSERDES and IODELAY.

Yes, it seems that a lot of the pain we've had with Sayma has been figuring out how to use ultra-scale FPGAs properly.

However, the fact that a lot (although apparently not all) of the SDRAM issues cleared up pretty quickly when @jordens got involved and tracked down the relevant user guides/Xilinx recommendations makes me think that this is still a very solvable problem with the right methodology.

Edit: note that I'm not trying to attribute blame here. Tracking down obscure, ill documented and often incorrect/contradictory Xilinx recommendations is a massive PITA and time sink (= expensive to to professionally) so I have a lot of sympathy for the frustrations it's caused you and how long it's taken.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-24 13:50

Ultrascale FPGAs have some pretty serious flaws. I've been trying for weeks to contact Xilinx support about how to phase-lock the output of a BUFGCE_DIV (the latter being required to use the IOSERDES correctly) to an existing reference clock and what I get is basically "we are very sorry, we don't have IP (sic) to do it, we will make it easier in the next FPGA generation", assorted with unhelpful suggestions that demonstrate a lack of understanding of synchronous logic theory.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-24 14:00

ack.

WIthout meaning to assign blame in any direction, a lot of the issues we're seeing do seem to trace back to you guys struggling to get ultrascale FPGAs working.

If these issues don't clear up at some point soon, I wonder if we do need to bite the bullet and spin up Sayma V2.0 with a Kintex-7.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-24 14:09

we've spend the most time so far on the SDRAM!

and hmc830 and ethernet and 1.8V and JTAG and completely unexplainable crashes and serwb and...

marmeladapk commented 6 years ago

From @dhslichter on 2018-04-24 18:30

@hartytp @sbourdeauducq I hear you both on all your comments, and @hartytp you are correct that I haven't been following the latest with Sayma very closely because I have a ton of other things to be doing in the lab. Some responses to the above:

The "plumbing" required to get an AMC to receive power and do communicate with the MCH for DRTIO is really minimal and easy to get going.

I haven't played with any of this stuff myself, but if this is true then why is the uTCA stuff such a problem for everyone (at least, so it seems to me based on the amount of issue traffic)? If in fact one can run things in uTCA and use the backplane for DRTIO distribution, and the MMC stuff is "simple", and there is an MCH card that can do the DRTIO distribution/fanout onto the backplane, then making Shuttler an AMC card would be fine with me. It just hasn't seemed like this is the case from my cursory overview. What is causing issues with uTCA, if not this?

I agree that "rolling your own uTCA" in a eurocard rack if uTCA will do the job for you is not the right way to go; the discussion above was based on my understanding of uTCA as requiring a lot more overhead to get it to run. Am I correct that if we want to use the backplane for DRTIO, we will need to have a Metlino card, or can we use a COTS MCH? Shuttler relying on Metlino is something I would like to avoid; in any event, it should have an SFP cage so it could be run standalone (e.g. down on an optical table).
The output signals from Shuttler will be differential analog voltages, so a suitable connector and cable would be used -- no individual SMAs. Probably SCSI or something similar, so front panel real estate shouldn't be a problem.
it might make the most sense to just use the Kintex-7 we know and love from the KC705 (XC7K325T-2FFG900C) to run this card. This is an expensive FPGA (although I don't know if @gkasprow gets volume pricing from having used it on other projects already?), but if we can fit 16 channels on Shuttler, using some of @gkasprow's sneaky tricks, then it only comes out to $100/channel or less. We could go for a higher speed-grade Artix-7 as well, I suppose.....
If we don't need an RTM, then we can use a less expensive uTCA crate (http://www.nateurope.com/products/NATIVE-SX.html perhaps?) and this would work nicely, all the stuff is sorted out for us. If we can get 16 DAC channels per Shuttler, then this chassis would give us 80 channels of DAC, which is a pretty good deal, I'd say.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-25 08:48

I haven't played with any of this stuff myself, but if this is true then why is the uTCA stuff such a problem for everyone (at least, so it seems to me based on the amount of issue traffic)? If in fact one can run things in uTCA and use the backplane for DRTIO distribution, and the MMC stuff is "simple", and there is an MCH card that can do the DRTIO distribution/fanout onto the backplane, then making Shuttler an AMC card would be fine with me. It just hasn't seemed like this is the case from my cursory overview.

@gkasprow Please can you confirm what the absolute minimum one needs on an AMC is to correctly draw power from the backplane and have basic BP communication with the MCH (e.g. no non-essential uTCA features such as ethernet over the backplane). IIRC, it's just strapping a few pins on the BP connector.

What is causing issues with uTCA, if not this?

I'd argue that nothing that's specifically uTCA is. However, there are a few extra features that weren't absolutely essential that were added (largely to help potential non-ARTIQ users of Sayma IIRC), which increased the complexity a bit. That added complexity, together with limited resources for debugging and development of things like MMC firmware, has led to some quite major delays and frustrations.

Out of the issues @sbourdeauducq mentioned, quite a few have nothing to do with uTCA: HMC830; anything ultra scale related; anything to do with having two boards/serwb (that's not required if one doesn't have an RTM); anything JESD204B related; etc.

Others aren't directly uTCA issues. e.g. the "1V8 issue" was just an incorrectly compensated regulator. It became tied up with the MMC because the regulator compensation is programmed digitally either over a header, which means hooking it up to an arduino or similar, or by the MMC. None of us could be bothered with the Arduino route, so we all waited for new MMC firmware, which took a while because it's fiddly and we only have limited manpower. Similarly, the ethernet got a bit complicated because of the way it's been hooked up. But, we could just scrap that altogether and still have a completely functional AMC DRTIO slave.

Am I correct that if we want to use the backplane for DRTIO, we will need to have a Metlino card, or can we use a COTS MCH? Shuttler relying on Metlino is something I would like to avoid; in any event, it should have an SFP cage so it could be run standalone (e.g. down on an optical table).

There are some COTS MCHs which, I think would probably work (although, I haven't looked into the details).

TBH though, I'm not particularly worried about that since, as a fallback, one can always just use the fibres to connect Shuttler to Kasli.

The biggest draw of uTCA for me is power/thermal management and having all the mechanics sorted properly (not to mention that the uTCA chassis generally have proper grounding as well!). BP communication is nice, but it's not a deal-breaker for me if Metlino turns out not to work.

Anyway, I'd still argue that we shouldn't give up on making Shuttler an RTM that uses Sayma AMC for the heavy lifting. Starting again from scratch to make a new AMC (or a complex EuroCard design for that matter) will always involve new bugs and risks, even if it is a lot simpler than Sayma -- look at how many of the issues with Sayma arose from things that we all expected to be trivial! Then there's also the time spent writing board support stuff for ARTIQ, yak shaving etc. If we get a working design then it's much better to use it.

Personally, my feeling is that if/when we get to the point where we're really happy that Sayma is working really reliably, then we should consider building a new RTM that runs off Sayma -- as I pointed out before, we could keep the Sayma gateware essentially unchanged and just reconfigure the number of channels if we add a JESD Rx stack to the RTM FPGA, then there is no new ultra-scale design to be done.

But, if the issues with Sayma keep dragging on and on and on, then a new simplified AMC would be the better route for both Shuttler and Sayma.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-25 09:24

µTCA power is definitely not nice, as demonstrated by Sayma not being powered up when placed into the crate, even after much tweaking of this annoying MCH configuration file. It is also expensive and more or less a duopoly (and according to Greg, the Vadatech stuff is even worse). I advocate for this: https://github.com/sinara-hw/sinara/issues/499#issuecomment-381043418 As for thermal management, I don't think we need the µTCA cruft either to power up a few fans. Much of the MCH is also expensive and complete cruft and should not be there. Just look at the stupid "tongue" connectors that this thing has...

marmeladapk commented 6 years ago

From @hartytp on 2018-04-25 09:34

µTCA power is definitely not nice, as demonstrated by Sayma not being powered up when placed into the crate.

@gkasprow please can you confirm the absolute minimum that needs to be done to power up an AMC.

I know things are a bit more complex for Sayma, but let's establish the bare minimum that needs to be done.

It is also expensive and more or less a duopoly (and according to Greg, the Vadatech stuff is even worse). I advocate for this: #499 (comment) As for thermal management, I don't think we need the µTCA cruft either to power up a few fans. Much of the MCH is also expensive and complete cruft and should not be there. Just look at the stupid "tongue" connectors that this thing has...

As to the rest of that, I'm not going to get into an argument with you here, but let's just say I disagree with you.

Yes, you can complain that the well-designed system is overly complex, and that you can do better with some duct-tape, fans and cheap chinese SMPSs. But, in practice, doing a decent job of mechanics and thermal management is hard, particularly once one starts having high power density, like shuttler would have.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-25 09:36

For the simple stuff that we designed the EEMs for (e.g. Kasli a few DDDs and some DACs) EuroCards and a cheapo fan works great. For more complex things, I think a better solution is needed.

marmeladapk commented 6 years ago

From @sbourdeauducq on 2018-04-25 09:41

µTCA is the opposite of good design. It's overengineered, NIH, inelegant and pretentious. When did I say that those power modules had to be sourced from noname Chinese vendors? Greg mentioned some high-quality resonant converters that are a superior option. There are also some crates like RatiopacPRO AIR (German!) that provide some level of thermal management without all the cruft. I suggest looking further into this kind of solution.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-25 09:53

µTCA is the opposite of good design. It's overengineered, NIH, inelegant and pretentious.

I've never heard a rack standard called pretentious before ;)

These are kind of ad-hominem attacks on a rack standard. And, again, I disagree. NIH? What perfectly good pre-existing standard were they ignoring to replace with their own?

And, again, if you want fast communication with a rack master, and lots of current supply then I'm really not aware of a decent alternative. Are you suggesting that hacking together stuff in a EuroRack with a mess of ribbon cables and wall warts is an "elegant" solution?

Greg mentioned some high-quality resonant converters that are a superior option.

Right, there are plenty of PSUs out there. But, it's still not trivial to do a decent job of mounting them, routing the power, sorting out grounding, etc etc.

There are also some crates like RatiopacPRO AIR (German!) that provide some level of thermal management without all the cruft.

I've been playing around with these recently. They're fine for what they are, but still quite expensive, quite long lead times, only available from a single supplier, and not that well designed (I certainly wouldn't call them elegant!). The thermal management isn't great (I wouldn't want to stuff them full of high-speed DACs and FPGAs). They don't provide any BP, so we'd still have the mess of ribbon cable/fibres, they don't provide any decent way of routing power. etc. etc.

So, yeah, they're great for a simple system with a few EuroCards, but not what I'd want to build a system of 100+ channels of high-speed DACs out of, which is what we're talking about for a complex ion trap like a HOA2.

marmeladapk commented 6 years ago

From @hartytp on 2018-04-25 10:39

To be clear, I don't love uTCA. I've been through several cycles of "let's replace it with something better" both with the people here, and in discussions to other groups. Each time, we've begrudgingly come to the conclusion that there really isn't a better option short of engineering something ourselves, which none of us want to do.

And, yes, uTCA has quite a lot of features we don't need. But, one doesn't have to actually engage with any of that (e.g. we don't need MMC). But it is well-engineered, and does its job well. There are also a decent number of COSTS racks available at a decent price. This is the reason that people like CERN use it (not some form of NIH).

I'd also warn you against extrapolating from the fact that EuroCards work fine for really simple Kasli systems to the assuming they will be a good solution for complex ones like the ones we're discussing.

Give me a BOM for COTS components for a rack with cooling, power management and decent ways of routing signals + power around, rather than an expanding rats nest of fibres/ribon cable, properly thought through grounding, etc (ideally this should all be preassembled, but certainly should require no user-machined parts, mains wiring, etc) and I'll gladly switch from uTCA. But, I just haven't seen such an alternative.

marmeladapk commented 6 years ago

From @vdirksen on 2018-04-25 10:56

I would recommend anyone to see the people of the Desy Techlab https://techlab.desy.de. Desy itself evaluated since year 2000 all different form factors and did developed a lot of standard systems. But they also built proprietary systems for the same reasons mentioned in this discussion. Talk to the people maintaining the MTCA and the proprietary systems. You will get a complete different picture. The last success story is XFEL, which was built in time and in budget.

sinara-hw / FMC_Shuttler

[RFC] New AMC: Shuttler (high-speed multi-channel DAC) #2