Open catfact opened 11 years ago
working on a branch for block processing using pingpong DMA descriptor lists, something like in this example: https://ez.analog.com/thread/71499
turns out that my optimism of some weeks back, was unfounded. but now a simple test case is working on the block-process-test
branch (e28e3c4ae93e32cc168fe37cc51fb62eaf1f877d)
there is a module in the top-level directory called dsp-block-test
that doesn't reference the bfin_lib code, and does very little (writes a sawtooth at a fixed frequency and doesn't respond to SPI.) but the setup seems ok!
what ended up working (so far) is something like this:
this scheme is a little memory-hungry; each frame added to the blocksize adds 64 bytes. and i'm sure it could be mildly optimized in other ways.
will keep cleaning up and making this more useful.
(first a major caveat of this being entirely new territory for me, feel free to ignore)
if i'm reading the code correctly the audioProcessIn
and audioProcessOut
have the samples (for the 4 channels) interleaved. looking at the bf533 reference manual it seemed possible to have the dma transfer de-interleave the samples - is there any value in trying the get that to work?
i was also wondering if it made sense to pass pointers to the audioProcessIn
and audioProcessOut
buffers (and possibly block size and frame size) to module_process_block()
in order to allow the block size and/or DMA strategy to change without causing breakage at the module level. the reason i started to think along those lines is that the 1D setup + isr which copies data from the i/o buffers to process buffers seems nearly identical to the double buffer (2D?) where DMA is filling/draining one pair of buffers while the process function operates on the other pair (then they switch avoiding the data movement being done in the isr). The tx/rx done signaling from the isr could also be realized by setting (global) pointers to the buffer(s) to process which main could test for != NULL.
it seems like you've potentially already explored some of this kind of thing and moved on. if there is anything in particular i'm happy to try some more experimentation. if not i'm temped to look into how the CV outputs and SPI bus servicing might factor into block test.
good points!
i did try a couple of other things. in the end the 1D structure was used because it turned out that my main problem wasn't the DMA structure at all, but the RX/TX timing, and this needed to be isolated. but yeah, it's not ideal.
i agree that it would be better to:
for 1), the answer is (as you say) to set up 2 buffers and use 2 DMA descriptors to set them up in pingpong mode.
for 2), i think this could indeed be done with the 2D DMA features. it's a little tricky. the way they implement interleaved streams is to allow the Y offset to be negative.
so, i think the 2D setup would be something very approximately like this:
#define CHANNELS 4
#define BLOCKSIZE 16
#define SAMPLESIZE 4 // (sizeof(fract32))
fract32 inputChannels[CHANNELS][BLOCKSIZE];
*pDMA1_START_ADDR = (void*)(inChannelArray);
*pDMA1_X_COUNT = CHANNELS;
// byte-address increment for inner loop
// want each successive transfer to point at the next channel array
*pDMA1_X_MODIFY = (SAMPLESIZE * BLOCKSIZE);
*pDMA1_Y_COUNT = BLOCKSIZE;
// each outer loop, want to jump back to element N+1 of the first channel
*pDMA1_Y_MODIFY = ((1 - CHANNELS) * BLOCKSIZE + 1) * SAMPLESIZE;
i think by default, when interrupt is enabled it is triggered at the end of the outer loop.
in the pingpong mode, all of this stuff would go in each DMA descriptor, something like this:
typedef struct {
void *pNext;
void *pStart;
short dConfig;
short dXCount;
short dXModify;
short dYCount;
short dYModify;
} dma_desc_t;
#define CHANNELS 4
#define BLOCKSIZE 16
#define SAMPLESIZE 4 // sizeof(fract32)
#define X_COUNT CHANNELS
#define X_MODIFY (SAMPLESIZE * BLOCKSIZE)
#define Y_COUNT BLOCKSIZE
#define Y_MODIFY (((1 - CHANNELS) * BLOCKSIZE + 1) * SAMPLESIZE)
#define DMA_FLOW_DESC 0x7700
#define DMA_CONFIG (WDSIZE_32 | DMA_FLOW_DESC | DMAEN | DI_EN | DMA2D )
fract32 inputChannels0[CHANNELS][BLOCKSIZE];
fract32 inputChannels1[CHANNELS][BLOCKSIZE];
dma_desc_t descRx1 = { NULL, inputChannels1, DMA_CONFIG, X_COUNT, X_MOD, Y_COUNT, Y_MOD };
dma_desc_t descRx0 = { &descRx1, inputChannels0, DMA_CONFIG, X_COUNT, X_MOD, Y_COUNT, Y_MOD };
void init_dma(void) {
// ping-pong
descRx1.next = &descRx0;
*pDMA1_NEXT_DESC_PTR = &descRx0;
//... etc
}
and of course an equivalent setup for the TX DMA.
as far as priority:
i've got some more changes to the branch - just restructuring so the block processing is in a bfin_lib_block/
and the test module code is separated. haven't pushed this because i haven't actually tested it on hardware. but i'll just go ahead and do that.
ok yeah, the restructured lib/module doesn't work yet. i pushed it to the branch anyway. dsp-block-test
has a couple changes too; slightly cleaned up, and now plays a wavetable osc on top of a passthrough.
would you like me to explore getting the 2D stuff working?
i was also starting to wonder if it made sense to have some overall concept of a duty cycle with servicing SPI and/or CV outs such that each could be clocked down / computed at some integer multiple of the audio processing duty cycle. if modules expressed their desired control rate it could allow one to free up cycles when high frequency control is not needed....
if you are interested in working out 2D DMA, and you have time right now, certainly go for it! i'm happy to try as well, but time is pretty crushed for the next few days.
also good ideas about specifying different control rates. i recently opened a sparate issue about SPI servicing (#239), commenting there
@ngwese i tried to implement deinterleaving and pingpong in the DMA descriptor. but i'm doing something wrong and can't figure it out.
latest commit (3b4da215f9ed0c97c3dec2eaa3fb6b240d3b5849) has a flag in audio.h to toggle this implementation.
with or without the flag, the module API now takes two deinterleaved 2d buffer of [channels][frames] as arguments for input/output. so it's pretty much transparent whether there is a copy step in the ISR or not.
the only wrinkle is that right now, the copy step also shifts the input/output frame data between the codec's native 24 bits, and the full 32 bits that is generally better for processing. maybe we should let the module do this instead so that the ISR wouldn't have to do much of anything if the pingpong DMA worked.
but i think i will go ahead and start working on some block-processing modules. i want a bigger bank of simple oscillators and a proper varispeed delay...
this isn't fully working yet but a promising step in the right direction (https://github.com/ngwese/aleph/commit/7cd798a3538c5b71628a0d73c22fed1b65349306) i get recognizable but distorted audio from ch 1,2 in to out
after lots of reading, re-reading, and staring at the DMA operation flow diagram things started to click. the above changes:
it seems like there were two things in particular which where problematic in the initial attempt:
init_dma()
the pDMAx_NEXT_DESC_PTR values were getting set but the initial FLOW value wasn't so when dma was eventually enabled it didn't know it was supposed to load the first descriptor.enable_dma_sport0()
was called the dma channels where enabled causing the descriptor to get loaded but it seemed to just hang because the channel was immediately disabled again (best i can tell)i'll keep at it.
(sorry, posted from wrong profile just now)
ach, thanks, not setting the initial flow mode/config was a pretty dumb one... maybe there is something wrong too with the way i set up the MODIFY/COUNTs causing your distortion?
in the meantime i went ahead with building out a test module (bank of sinewaves to start with) and making some modification to how param changes are handled - SPI rx ISR adds them to a FIFO queue and updates the raw param states, and the block ISR processes the queue and does whatever module-specific stuff with those values.
this almost works, but there is some screwy timing problem (i guess) when loading a scene from bees.
and, still need to add the sport1 stuff for CV output... a good opportunity to do it more correctly.
the MODIFY/COUNTs seemed fine when i checked - i walked the code line by line trying to wrap my head around it and verify things against the bfin reference manual. i can double check.
my next step was going to sanity check the sample handling (24 vs 32 and the serial setups) as well as wrap my head around the what is going on timing wise between the dma transactions, isr, and process code.
well there is definitely the 32b/24b problem. but shouldn't matter if you're just copying in to out...
opened a PR (https://github.com/tehn/aleph/pull/248) with a functional setup. the final missing piece turned out to be trivial, minor typo in the tx isr.
btw. tried to compile the rawsc
module and it is missing its linker script.
wow, excellent, thanks! merged PR. also added missing .lds (its just the same default), moved sample shift to module, moved block size definition to module, couple other tweaks.
somewhere along the way this scene-recall init problem went away...? maybe it was user error somehow.
I'm wondering if the dma controller ( as setup above ) is mem-to-mem between L1 and SDRAM? It doesn't look that way at a first glance. I´m thinking about using one stream as a playhead and the other as a record head between these two memory spaces?!
the audio I/O buffers aren't in SDRAM, no. they are in L1; read in one bank, write in the other.
SDRAM access is about 8x slower than L1, so in general we would not want to put the I/O buffers there.
i would just transfer stuff in and out of SDRAM in the block process routine of your module.
but if you want to, you could use a different DMA setup pointed at SDRAM addresses. from the bf533 manual (page 9-51) it looks like using DMA to access external memory is perfectly OK.
should implement optional audio buffering.