quartiq / stabilizer

Firmware and software for the Sinara Stabilizer module with high speed, low latency ADC/DAC data processing and powerful DSP algorithms in between
http://quartiq.de/stabilizer/
Apache License 2.0
106 stars 25 forks source link

[meta] use case analysis #147

Open jordens opened 3 years ago

jordens commented 3 years ago

With Stabilizer we have a powerfull CPU sitting between ADCs, DACs and DDS channels. Since this is extremely generic it allows dozens of different use cases to be implemented. Implementing all or even many of them is tricky as there are lots of interdependencies, constraints and corner cases. The interactions can also lead to bottle-necks since they compete for the same resources (CPU time, latency). We need to come up with a clear set of high-value use cases and their requirement in terms of features to be implemented. Then we can decide whether features can be made available at compile-time or run-time and how that affects usability, testing and deployment. I'll try to start with the use cases here and then once the features become clearer, fork them into their own issues.

Use cases

  1. IIR control: D-part, I^2 part, notches, LP, HP
  2. WMS: Lock-in, Modulation
  3. MTS: Same as WMS from Stabilizer perspective
  4. FMS/PDH: Demodulation with (Pounder) and then it's either DC (nothing new) or superhet-to-DC (then MTS/WMS)
  5. Phase lock: Lock-in R/phi, phase unwrap
  6. Complex MIMO lock: Some beat input, demodulate superhet with Pounder, feedback amplitude error on AOM amplitude and phase error on AOM FM/PM and slow frequency error on laser piezo/current/temperature DAC output.

I/O Architecture

Processing blocks

With the ADC, DAC and DDS data available in memory buffers for the CPU to consume/produce without delay and overhead, the processing should be routines called at configurable rate. The partitioning of the processing to the routines may reflect the graph partitioning (a low-latency single-biquad at high rate between one ADC-DAC pair and a slower demodulating multi-biquad at lower rate between another ADC-DAC pair). How each routine handles the ADC (one to multiple) samples and generates the DAC/DDS data (one to multiple) can be configured by linking up the processing blocks and configuring them. The data path flexibility may require compile-time configuration in some cases. In general there will be processing blocks that can be inserted and wired up in many different ways in the datapaths.

There may be some kind of configuration language involved here to describe the processing graph(s) and the settings.

TODO: boil all these down into a common DSP language

Settings/Telemetry

ryan-summers commented 3 years ago

I/O Architecture Design

A quick mock-up of the design: stabilizer

Background

There are two DMA peripherals available to us, each with 8 DMA streams, which allows for a total of 16 DMA streams. Each DMA stream has a configurable trigger, source/destination address, and transfer size.

Each DMA stream can operate in circular mode. This means that when a transfer is complete, the DMA configuration is automatically reloaded and can be triggered again without software intervention.

Each DMA stream can operate in double-buffered circular mode. This means that when a transfer is complete, the DMA configuration is automatically reloaded with alternating buffers A/B (where the new buffer is always the opposite buffer of the previous) and can then be triggered again without software intervention.

Below, streams are indicated by under 0-15, each streams 0-7 belong to DMA1 and streams 8-15 belong to DMA2.

ADC Inputs

In order to facilitate ADC sampling automatically, the following architecture is proposed:

This method requires an update of the DMA dest + length registers (and an enable), which is non-zero DMA interaction. While it is possible to configure DMA to run continuously in a double-buffered mode, this wouldn't inform us of input overflow events.

This implementation would be to rely on the the 8 or 16 byte internal RX FIFO of the SPI peripheral. There is a small latency between the completion of the first N ADC samples and initiation of the transfer for the next N. The internal SPI FIFO should be sufficient to buffer this latency out.

With the above proposed configuration, we could use the SPI RX FIFO threshold interrupt as "ADC input overrun" detection.

DAC Outputs

In this configuration, the SPI TX FIFO threshold can be used as an interrupt to detect output underrun events.

This method requires an update of the DMA dest + length registers (and an enable), which is non-zero DMA interaction. While it is possible to configure DMA to run continuously in a double-buffered mode, this wouldn't inform us of output underflow events.

Discussion

With the proposed architecture, DMA register updates (address + length) are required at the start of user data processing as well as the end (when staging output samples). While this is a non-zero amount of overhead, it does provide us with hardware detection of under- and over-flow events.

If we would like truly zero CPU overhead for input/output buffering, that can be accomplished by sacrificing hardware detection of under- and over-flow detection by using circular, double-buffered DMA streams (which is a possible DMA stream hardware configuration).

ryan-summers commented 3 years ago

As an expansion to the above, I believe we can have the input operate in double-buffered circular mode. In this configuration, user software can detect an over-run condition on the input by checking the DMA transfer complete flag after execution. If user software takes too long and the DMA transfer completes first, it means that the data has likely been corrupted. This can be used to supplement the SPI hardware overrun detection.

I do not currently think this is a feasible alternative for the DAC outputs because it's not clear how we could detect if uninitialized data was unintentionally written to the DAC SPI TX FIFO. The main issue is that we would always be theoretically racing to put data into the DMA output buffer right before it swapped over to start transferring it. There's sufficient jitter here that we might miss a sample every so often. I think it makes more sense here to rely on standard, single event transfers for the output.

dnadlinger commented 3 years ago

Just to throw it out there, regarding the "modulator" bullet point: Another use case that's quite important to us (i.e. already deployed) is to modulate the PI setpoint (i.e. input offset) from a PLL locked to a 50 Hz reference at a digital input, with a number of 50 Hz harmonics of configurable complex amplitude.

jordens commented 3 years ago

@dnadlinger Thanks. As you know there are dozens of different use cases. We now need to find people willing to work on this together. After throwing it out there, the next step would be for you to analyze your use case, break it down into components, see how the existing components match, and contribute the additional/improved concepts to a joint roadmap.

jordens commented 3 years ago

A couple points on your analysis above @ryan-summers :

dtcallcock commented 3 years ago

I just spotted this product: https://liquidinstruments.com/instruments/

Might give some high-level ideas.

ryan-summers commented 3 years ago

Support for the updated IO acquisition is implemented in https://github.com/quartiq/stabilizer/pull/165 - with the changes, it appears that there is substantially more time available for DSP-related activity for each individual sample.

It will likely take some time to stabilize the changes in our dependencies before merging this in, but that will provide us a lot of flexibility in terms of building applications on top of the general I/O interface.