Closed dhslichter closed 7 years ago
It would be great if you could first determine the requirements and not start with prescribing an implementation. Your long text makes way too many assumptions and implications about how we would implement this, about what's easy and what's impossible. It's really hard for me to comment on this in any sensible way as we'd have to unwind everything, determine the actual need, and then start again. We also think that the implementation that we laid out a long time ago (in response to a rough set of requirements) is simpler, more generic, and more flexible than what you appear to be implying here.
Here are the requirements and motivation:
I appreciate the capability of the current Sayma design, which is substantial, but would it be capable of meeting these requirements?
Thanks. That's a good specification. We have been keeping in mind such a sample-based RTIO channel and its rough design while we are developing (D)DMA, DRTIO, and the Sayma waveform parametrization. In that sense, yes: the current Sayma design supports the development of such a feature. We are inviting interested parties to verify/modify/amend the requirements and fund this development.
In short, we'd just implement an additional RTIO output channel that takes N (let's say 8) samples per RTIO clock cycle (let's say 8 ns) and adds them onto the DAC samples, just like the DC spline is added to the oscillators. Either before the GHz DUC (digital up-converter) or after.
We use 64 bit DDR3 running at 800MHz.
@gkasprow by 800 MHz, do you mean the SDR clock or the DDR clock? I assume that you mean the SDR clock, so the total maximum theoretical data rate would be 64 bits * 1.6 Gbps = 102.4 Gbps?
800MHz clock rate, so the data rate is 1.6Gbit/s * 64
yes
On 29 September 2016 at 20:34, dhslichter notifications@github.com wrote:
@gkasprow https://github.com/gkasprow by 800 MHz, do you mean the SDR clock or the DDR clock? I assume that you mean the SDR clock, so the total maximum theoretical data rate would be 64 bits * 1.6 Gbps = 102.4 Gbps?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/m-labs/sayma/issues/6#issuecomment-250552913, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH-vurrQlKsMLjZqhVF-Rnw6bsPtrQSks5qvASugaJpZM4KIQ-Q .
In this case, one should be able to stream 16-bit samples at 1 GSPS to at least 4, and perhaps 6, DAC channels. This is sufficient for most purposes I would imagine; the remaining 2 (or 4) channels can be just the usual spline/CORDIC generators. I like the idea of having the sample streamed from memory simply added to the samples generated by the interpolators, since it covers all the use cases I can think of in a simple and flexible way.
The repetition and looping (as well as the lookup table) seem to me to be fairly key components of a successful implementation, so I am glad you think they would be doable (if tricky).
Potentially. But don't be fooled by calculating too tightly:
I have one question. Would it be more suitable for this purpose to have 2 banks of 32bit SDRAM or single bank with 64bit memory? 2 banks give additional flexibility - one can use the memory controllers for different purposes. The drawback is more difficult layout and more FPGA pins used. It also uses much more logic resources
On 29 September 2016 at 23:18, dhslichter notifications@github.com wrote:
In this case, one should be able to stream 16-bit samples at 1 GSPS to at least 4, and perhaps 6, DAC channels. This is sufficient for most purposes I would imagine; the remaining 2 (or 4) channels can be just the usual spline/CORDIC generators. I like the idea of having the sample streamed from memory simply added to the samples generated by the interpolators, since it covers all the use cases I can think of in a simple and flexible way.
The repetition and looping (as well as the lookup table) seem to me to be fairly key components of a successful implementation, so I am glad you think they would be doable (if tricky).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/m-labs/sayma/issues/6#issuecomment-250594465, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH-vlGV7fKR5BWHFDXr9De8N-LrE02jks5qvCs6gaJpZM4KIQ-Q .
I think 4 channels is a sensible number, and still provides a good enough price point per channel. I realize that 6 channels would be pushing it (thus my modifier "perhaps"), given how close it is to the absolute maximum theoretical bandwidth with no consideration of the other issues you mention.
It seems to me that if we have two banks of 32 bit SDRAM, each one will only be able to service 2 (or perhaps 3) channels, so you would end up having to use both for 4 channels, and then you are back to where you were with the 64 bit SDRAM where the streaming of DAC samples is competing with other tasks for use of the memory. Based on that, I say stick with the single 64 bit bank, especially if this simplifies layout and decreases FPGA resource usage. Others may feel otherwise!
A few comments:
The extra logic resources used by an additional bank shouldn't be very high (~kLUT at most) but it does make the layout more complex and uses IO. Is it possible at all to have more than 64 bits (e.g. 128), or will we run out of HP IO pins and/or exceed the maximum fanout for the command/address bus?
I will answer this in a few days once I finish the schematics and connect all mezzanines.
On 30 September 2016 at 05:26, Sébastien Bourdeauducq < notifications@github.com> wrote:
The extra logic resources used by an additional bank shouldn't be very high (~kLUT at most) but it does make the layout more complex and uses IO. Is it possible at all to have more than 64 bits (e.g. 128), or will we run out of IO pins and/or exceed the maximum fanout for the command/address bus?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/m-labs/sayma/issues/6#issuecomment-250650176, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH-vkFpD6Yp4yZSRbZV9orz1hHvuKLZks5qvIGCgaJpZM4KIQ-Q .
If 128 bits is too wide, could one do a 64-bit SDRAM (which could optionally be dedicated for pulse streaming if desired) and a 32-bit SDRAM?
@jordens @sbourdeauducq Is this specified tightly enough for m-labs to generate provisional specification and a cost estimate?
Yes. Is there interest and funding? We'd like to not work on this right now and delay writing the specification and the quote a week or so.
I think the main issue for the present time has been addressed, namely how wide the SDRAM bus should be. If we have agreed on a 64-bit SDRAM plus a 32-bit SDRAM, this should suffice for allowing the kind of streaming discussed above while keeping a separate memory bus available for the soft processor etc. I agree with @jordens that it would be good to delay writing the spec at this stage unless there is funding ready to go with a deadline.
For some applications, the spline interpolator/CORDIC architecture is not sufficient to generate DAC output waveforms of suitable complexity or speed. For example, pulses with shaped edges of short duration (edge times <10 ns), or pulses containing more than two frequency components, cannot be constructed by the presently planned gateware.
It would extend the power of the design considerably if it were possible to precompute DAC output waveforms to be stored in RAM on the Sayma card, which could then be played back at specified times. Such capability would be of use to solid-state quantum information systems, as well as for near-field microwave gates in trapped ions. It would also increase applicability of the Sayma hardware to the wireless test/radar/SDR world, as this would allow realistic data transmission tests etc.
Ideally, I would envision a hybrid mode of operation where the data pipeline to the DACs could be fed either from the interpolator/CORDIC cores or from a direct sample stream from RAM, with a switch that allows changing between these two sources on the fly at specified times. Basically if you have a FIFO for the JESD204B output, you would have a switch that chooses the data source for feeding this FIFO, either from a memory source or from the interpolator/CORDIC core. The memory source could have a FIFO before this switch to allow for latencies etc in the DMA. The interpolator/CORDIC engines run all the time, and the samples they generate are either passed through into the JESD FIFO (if they are switched in) or simply dropped (if the memory is switched in).
If this is too complex, it would be suitable to have the switch between data sources not occur with precise timing (but with some "small" nondeterministic jitter allowed), but with the addition of a mechanism to play a waveform from memory and then hold its last value after the waveform stops until a specified time when the next waveform is played.
Another feature of great utility would be to enable chaining or looping of pre-recorded waveforms from memory into the DAC data pipelines. This is the typical behavior of commercial hardware AWGs, e.g. from Tektronix.
To be worthwhile, the streaming should be able to occur at up to 1 GSPS (16 Gbps) on at least 2 of the 8 output channels (more would be better if possible, e.g. 4 channels, which would require 64 Gbps, thus saturating the rate for the current memory controller on the KC705). One could potentially back off the sample rate slightly (e.g. to 800 MSPS) to give headroom if need be. Another compromise would be to stream samples of reduced bit depth to more channels (4 channels @ 1 GSPS, but with 12-bit samples and the 4 LSBs zeroed for each channel), if this eases memory bandwidth requirements (although it may not, depending on how samples are stored in memory).
Waveforms could be precomputed on a computer and downloaded to RAM ahead of time. Alternatively, waveforms could be precomputed at lower-than-realtime speed by the interpolator/CORDIC cores on the Sayma card (i.e. for a 1 GSPS waveform, computing sample points at 200 MHz using the gateware, the playback would occur at 5x the speed of generation, and each time step in the interpolator/CORDIC would correspond to 1 ns instead of 5 ns), stored in memory, and played back at full speed on demand. This method (which relies on downtime for the interpolator/CORDIC cores, which may not be the case) could ease some of the communications bandwidth requirements for transferring waveforms from the Metlino or PC, and give more autonomous/distributed operation. Alternatively, waveforms could also be precomputed by an accessory hard CPU on the FMC connector of the Sayma card.