monome / aleph

open source sound computer
Other
79 stars 39 forks source link

CV output glitches #294

Open catfact opened 6 years ago

catfact commented 6 years ago

not sure when this happened (probably long-standing, truth be told), but there seem to be constant glitches in CV output.

reported here: https://llllllll.co/t/aleph-user-study-group/177/179?u=zebra

reported issue: in waves/lines, mapping two encoders to two cv outputs causes visible glitches in both outputs when both encoders are being moved (but not so visible when just changing one at a time.)

the glitches take the form of random-looking transient spikes in both/all outputs.

i did some more testing:

with slew disabled and looking closely, these "spikes" are actually long bursts of pulses. each pulse has about a 50us, 50% duty cycle, oscillating between 0v and ~0.6v, and the bursts are ~25ms long.

it is very very strange. given the speed of the pulses i can only assume they are being produced by some clock signal. so i will start digging into the CV update module and the SPORT1 / AD5686 configuration. also can't quite rule out a hardware issue...

catfact commented 6 years ago

more data points:

tried these steps also in various combinations, none of them change the behavior:

so i think this is definitely a problem with the SPI configuration and not with timing per se.

in particular, i'm suspicious of the nasty "extra clock bit" hack: https://github.com/monome/aleph/blob/dev/bfin_lib/src/init.c#L139 https://github.com/monome/aleph/blob/dev/bfin_lib/src/cv.c#L43

boqs commented 6 years ago

OK I had a quick look at the code & at the AD5686 datasheet http://www.analog.com/media/en/technical-documentation/data-sheets/AD5686_5684.pdf

page 19 of the datasheet says:

These data bits are transferred to the input register on the 24 falling edges of SCLK and are updated on the rising edge of SYNC

And from defBF532.h:

#define TCKFE       0x4000  /* TX Clock Falling Edge Select  */

So I'm trying this: https://github.com/boqs/aleph/commit/5a55a4e2d044187dc95e433aec57cd50b0054c0c

my hypothesis is that the 25-bit kludge half-works because we have the wrong clock edges

catfact commented 6 years ago

thats a good hypothesis... i am certain i've tried it before but will give it another go

i also see in my comments that the datasheet askes for 50Mhz clock and we're giving it 27Mhz... but looking at datasheet again i think it wants 25... however bumping the clock speed down didn't help

maybe the combination will work...

.... and no, i get beautiful garbage.

NB: this has bitten me before, but the DAC datasheet specifies on which edge the data should be sampled, and the bfin config is where the data should be driven... so they actually should be opposite (iirc)

catfact commented 6 years ago

ok, so this is driving me absolutely nuts. it is also giving me deja vu since i spent hours going through the exact same thought process the first time.

the current solution (25-bit TX, frame sync active high) seems totally backwards and i don't understand why it works as well as it does.

ad5686 really seems like it wants active low sync pin further more it really seems like the SPORT mode should be "late frame sync" - that is:

the first bit of the transmit data word is available and the first bit of the receive data word is sampled in the same serial clock cycle that the frame sync is asserted

that idea is supported by this AD appnote: http://www.analog.com/media/en/technical-documentation/application-notes/EE_304_Blackfin.pdf

we are doing something absolutely bizarre with the current configration - like, seeing bits from the wrong channel in the wrong phase of the sync signal.

here are some relevant timing diagrams; for the DAC:

ad5686-timing

and for the SPORT (late frame sync mode):

bfin-sport-timing

it really seems to me like those configurations should line up, with the modification that frame sync on sport should be active low. so this is what i would expect to work:

// internal clock
// internal, reuqired, data-dependent frame sync
// drive data with clock rising edge, expect sample on falling edge
// frame sync active low
// late frame sync
*pSPORT1_TCR1 =  ITCLK | ITFS | TFSR | LTFS | LATFS;

// 24-bit word, secondary data enabled, normal mode (not stereo)
*pSPORT1_TCR2 = 23 | TXSE ;

// system clock is 108 MHz, use 18 Mhz SPORT clock to be conservative
// tclk = sclk / ( 2 x (div + 1)
*pSPORT1_TCLKDIV = 2;

but the result is no output. the weird backwards configuration we started with is the only way i've found to get anything close to the right results. (and i've certainly tried evey dumb combination like swapping the clock edge.)

so, i'm going to try two other things:

  1. somewhere back in the deep history of the repo, i'm pretty sure there is a working test of clean ramp generation on all 4 DAC channels, that doesn't use DMA and maybe doesn't even use the audio codec at all. will try to resurrect it

  2. will try just bit-banging the bastard, assuming i can set up SPORT1 pins as GPIO

  3. i dunno, might try to get an AD5686 sample and drive it from some other part.

  4. don't have a logic analyzer at the moment. will open up the board and see if i can get anything useful with a 2ch scope.

i hate this bug.

ngwese commented 6 years ago

I have a recollection of doing battle in this area when trying change things to update all four dac channels per block - on the block processing branch way back when. I could never reliably set more than the first channel without things freaking out.

I have a vague recollection of looking at the hardware schematic and the various datasheets and starting to wonder if there was some fundamental incompatibility between the parts (dsp/dac) and how the dac wanted to do frame sync. ....this was all a long time ago and I'm not a hardware person so I could be well off in my thinking. I did try every clock/sync combo I could without success.

I can't offer any more than "I hate this bug too".

catfact commented 6 years ago

ok i have a kinda nebulous new theory: maybe need to be looking at the DMA setup

it appears that with frame sync active low, consecutive transfers never bring it high. (https://ez.analog.com/message/232230) and we have SPORT1 TX driven by DMA4 in autobuffer mode, looping on a single 4-byte value

so that seems kind of wrong. we should either have DMA do single trasnfers (with descriptor i guess?) or just write directly to the SPORT1_TX FIFO.

gonna try the latter right now... .. (hours later) no dice, i can get sport1 tx interrupts and what seems like a reasonable update loop, but nothing at all is coming out the cv jacks

i would like to scope the ad5686 pins at this point, but it's actually not that easy; removing the top plate makes everything basically non-functional. could be a good time to add serial input to the bootloader... otherwise some scary test-lead attachments (and i forgot that the ad5686 connections are modded to begin with)

anyway an easier step is to try going back to using DMA4 but not in autobuffer mode. maybe tomorrow...

catfact commented 6 years ago

great scott! disengaging autobuffer mode on dma4 seems to have done it! no glitches, using the logical sport1 configuration.

i'll clean this up tomorrow and put out a test build

here's the branch in case i get hit by a bus https://github.com/catfact/aleph/tree/cv-debug

boqs commented 6 years ago

hypothesis: reported lines CV output glitches in this PR https://github.com/monome/aleph/pull/296 were caused by CPU overload.

With regards the 'stuck' behaviour of 'syncing slews' - I think this constant is way too low:

https://github.com/monome/aleph/blob/dev/dsp/filter_1p.c#L20

Here's how I believe mult_fr1x32x32 works (and my C builds seemed very consistent at reproducing numerical behaviour when devloping modules):

Therefore the sync threshold should be at least 1 << 12. I'm trying 0x4000 - still better than 100dB

boqs commented 6 years ago

yikes @catfact - just discovered ecf1b324 broke acid & dsyn modules! no idea why but they make no sound until I revert the change

New (optimistic) hypothesis: we can now revert ecf1b324 because https://github.com/boqs/aleph/commit/a3eb74413ac52c1daa735918219d71294f1246d3 addresses the root cause of CV glitching with the old driver. I recall it only reared it's head when using two encoders simultaneously! And iirc encoders are polled off the event loop, so those encoder bangs may end up hitting the bfin within a single audio frame in absence of any param throttling. This is demonstrably not correct.

Is it even possible the horrible 'dummy' param hack was working round some other weird side-effect of ecf1b324!? This appears to be the case.

catfact commented 6 years ago

ok, very plausible that new DMA config is smashing memory.

many apologies, thanks for catching and fixing

will come back to it when i have a minute