sinara-hw / Waver_FMC

Four channel DC-coupled, 10Vpp capable, 1.5GS/s AWG FMC
1 stars 0 forks source link

Software support #3

Open gkasprow opened 2 years ago

gkasprow commented 2 years ago

We plan to write some initial support for this FMC to have it ready when prototypes arrive. The idea is to have quite versatile AWG in the classical sense, I mean that the user defines the samples and then plays them on the DAC. If one needs SAWG functions, there is Phaser gateware that can be easily ported since we use the same FPGA. I'd like to have the possibility of using both Artix FMC carrier (block RAM) and Kintes US carrier (ext 64bit SDRAM). In this way, we can choose the max sample depth. The question is how we define the samples in the most user-friendly way. How do we upload the samples to the SDRAM? The FMC has a trigger input, we also want to be able to trigger generation using internal signals/timestamps. We also have a trigger output that can trigger another AWG card. Channels can be played simultaneously or triggered sequentially. Of course, DDS-like operations also need to be supported. Another function that is quite essential is segmentation. The programmed number of samples are played after the trigger, with the next trigger, consecutive samples are played. This makes a lot of sense when we need to generate a very long set of pulses where they are separated by long periods of inactivity, but every pulse is different. in this way, we can spare a lot of block ram memory. @kaolpr @jordens any thoughts?

kaolpr commented 2 years ago

Kintex US carrier: https://ohwr.org/project/diot-pfc-ku/wikis/home Artix carrier: https://github.com/sinara-hw/EEM_FMC_Carrier/wiki

For transporting samples we could initially use the same mechanism as for Phaser as it would be common for all configurations.

Am I right, that we would have the following "types" of generation:

Then we need to define behavior for every trigger. So maybe let's have two sets of entries:

User would first create patterns to be generated, upload them as data-sets and than define how those patterns are to be used when trigger signal is detected.

gkasprow commented 2 years ago

I'm not sure if we need to implement interpolation, because this is what DAC can already do. It upsamples the input data by a specified factor. The Kintex is much more capable so it could stream data with twice the speed of Artix, so the interpolation in the DAC could be 2x instead of 4x/8x. DDS mode is essentially direct sample mode, but we must specify the entire period of signal and the DDS core generates address using phase register. Right, it's logical to divide it into two independent blocks. It would be probably easier to implement the DDS core independently from other modes.

AAWO commented 2 years ago

Any thoughts?

@sbourdeauducq @jordens @dtcallcock @dhslichter

dhslichter commented 2 years ago

@gkasprow @AAWO @kaolpr It's not clear to me what the goal of this board is. Is it intended to replace/supersede baseband Phaser? Understanding the goal is important before picking a way to parameterized/output waveforms.

It seems to me that the simplest thing is just to implement Phaser gateware on the carrier FPGA and not worry about the "streaming samples from memory" method, which is not used as often for most ARTIQ folks. Groups that do use this (e.g. superconducting qubits) usually do small segments of samples (tens or hundreds of samples per waveform), which can be triggered, looped, etc. For atomic systems the waveforms would be thousands or tens of thousands of samples long. Using the DAC to do interpolation is a sensible choice in most instances as well.

gkasprow commented 2 years ago

No, it is not intended to replace Phaser but to have a general-purpose AWG with DC-coupled output, typical for lab equipment. The easiest approach is as you wrote, to implement buffers in internal memory which one can:

AAWO commented 2 years ago

Ok, summing up the current requirements:

Any other requirements, specs or new thoughts? @dhslichter @sbourdeauducq @jordens @dtcallcock

jordens commented 2 years ago

Very ambitious. The tricky thing will be to design a good API for this that meshes well with RTIO.

dhslichter commented 2 years ago

I agree with @jordens here -- it is going to be very nontrivial to get all of this to work properly. Would it make sense to start with a simpler, less flexible framework to begin with, just to demonstrate the performance of the hardware? For example, starting with a port of the Phaser gateware, but maybe doing the calculations on the Waver FPGA rather than on a Kasli to enable higher baseband sample rates (before digital upconversion)?

gkasprow commented 2 years ago

I'd start with simply playing samples from the internal memory with interpolation done in the DAC only. That's enough to demonstrate HW operation.

AAWO commented 2 years ago

Together with @kaolpr we came up with an initial architecture concept for Universal AWG Platform. Let us know what do you think about it.

General description

Universal AWG Platform is a project that aims to provide a generic platform for arbitrary waveform generation. It supports multi-channel generation, flexible wave-form definition and extensible triggering.

It is designed to be used primarily with Sinara project devices in two implementation schemes: with ARTIQ running on the same device (e.g. Shuttler / Waver used with AFC4 / KC705) or running on a peripheral / non-ARTIQ device (e.g. Shuttler / Waver used with EEM FMC Carrier or DIOT FMC Carrier). For that reason ARTIQ-independent timing and triggering subsystems were introduced.

UAWGP defines waveforms as the sum of samples generated by a number of generators running in parallel. Every generator can be defined separately and can range from simple sequencers outputting given samples, through DDS generators, up to DSP chains generating waveform required for user application.

UWAGP provides a way of configuring those generators as a function of time and external triggering signal.

In order to achieve maximum flexibility, two concepts of data sets are defined:

  1. Instruction - a single piece of configuration data for a particular generator with defined duration and general configuration (like reset, clock enable, etc.). Generator configuration, in case of DDS generator, can consist of FTW, phase or amplitude. In case of standard AWG, it can contain waveform samples.
  2. Sequence - a collection of instructions that are executed on trigger. Sequence defines which instructions (from what addresses in memory) are to be passed to generators. It can also define what happens when sequence is finished - for example to start it over, move instantly to the next one, wait for the trigger or go to idle.

Both Instruction and Sequence contain a header field and a payload field. The header field contains information about target Generator, time delay, length of payload field, and Generator's control flags, such as output enable, run etc.

The UAWGP architecture supports future extension with SDRAM for longer instruction storage. For best performance, it is recommended to provide a dedicated SDRAM for UAWG module - not shared with other system modules.

architecture diagram

Channel

A single UAWG channel consists of a user-dependent number of Generators. Theoretically the number of Generators can be any number greater than 1. Each Generator can be controlled independently.

Operation of the channel is orchestrated by the Sequence Processing Unit (SPU). SPU receives trigger signals and can be configured to respond with a specified sequence.

Every channel also has one special and obligatory generator - with index 0. This special Generator is used for global channel-wide control over other generators - i.e. it can enable a generator's clock signal, reset internal counters and enable sample output. The 0-index generator is (by instructions) controlled in the same way as other generators.

The sum of all implemented generators' outputs (excluding 0-index generator) is then fed through the optional output enable block which provides a gating feature and allows for cutting off signal output from a number of generators with a single instruction. Signal sum can be optionally passed to the interpolator block, before its output to the DAC phy (platform specific, not part of UAWGP).

Generators are configured using instructions. Initial implementation assumes that instructions are stored in BRAM-based modules with two access ports. One is for writing instructions, the other is used by the instruction dispatcher to fetch instructions and pass them to the right generators.

Instruction dispatchers are responsible for decoding the outermost instruction frame that consists of target generator id and instruction length. Their operation is configured and triggered with SPU.

Every channel can have 1 or more instruction dispatchers and memory modules. Those are a bit like ARTIQ lanes. If only one is present, a single generator configuration can be updated in a clock cycle. To be able to update generators simultaneously, more “dispatcher lanes” can be instantiated. Block RAM - dispatcher mapping can be a subject to further optimizations in the future.

SPU has a dedicated storage for sequences and maps trigger signal to the sequence of the instructions to be presented to the generators. It consists of the FSM and controls instruction dispatchers by defining starting instruction, number of instructions to execute and by asserting a run enable signal to the dispatchers.

As a future extension a support for SDRAM can be introduced. This would allow storing much longer sequences which can be particularly beneficial in case of direct waveform sequencers. Instructions would then be cached in BRAM memory modules and fetched from SDRAM as cache is becoming depleted. This method, though allowing for long sequences of instructions, will still be limited by the bandwidth of SDRAM.

Synchronization between channels (and between separate instances of UAWGP) can be achieved by synchronous triggering, as from the moment af trigger arrival lengths of all operations are deterministic.

sbourdeauducq commented 2 years ago

It is designed to be used primarily with Sinara project devices in two implementation schemes: with ARTIQ running on the same device (e.g. Shuttler / Waver used with AFC4 / KC705) or running on a peripheral / non-ARTIQ device (e.g. Shuttler / Waver used with EEM FMC Carrier or DIOT FMC Carrier).

Why not use ARTIQ?

UAWGP defines waveforms as the sum of samples generated by a number of generators running in parallel. DDS generators

When you are just doing multi-tone, inverse Fourier transform can be much more efficient than summing DDS outputs.

This method, though allowing for long sequences of instructions, will still be limited by the bandwidth of SDRAM.

...which is strongly dependent on the access pattern. You could turn low-level SDRAM commands (precharge, activate, read, and even autorefresh) into sequence instructions and optimize them at sequence compilation time. Then SDRAM can have fully deterministic and potentially higher performance. Maybe you could even add some control flow into the sequence.

What are the applications of this UAWGP?

gkasprow commented 2 years ago

The main application is the replacement of general-purpose lab AWG that can be triggered by external signals. The signal needs to be defined in the time domain. Usually, one generator will be attached to one DAC, but I don't want to limit it. I confirm, long time ago I managed to achieve nearly theoretical SDRAM bandwidth by building a state machine that was both SDRAM and FIFO controller.

gkasprow commented 2 years ago

One of our clients needs to generate a series of negative and positive DC-coupled pulses as a response to the trigger signal. Another one needs to output short Gaussian pulses every second.

kaolpr commented 2 years ago

Why not use ARTIQ?

We assume there will be no ARTIQ node on EEM FMC Carrier (at least in EEM version), as there is none on the Phaser. Additionally, we'd like to be able to use it outside ARTIQ for applications not requiring level of flexibility ARTIQ offers.

Regarding SDRAM use - this of course can be implemented in a different ways, but I believe we should start with something generic and then, when need for top bandwidth is confirmed, turn to more platform specific solutions.

kaolpr commented 2 years ago

When you are just doing multi-tone, inverse Fourier transform can be much more efficient than summing DDS outputs.

True, however it can be implemented here as well. We'd use a single "generator" implementing spectrum-based generation and IFFT.