xmos / lib_mic_array

Microphone array library
Other
22 stars 32 forks source link

Stopping ma_frame_rx() and resuming later while MCLK active causes ma_frame_rx() to block indefinitely #171

Closed ed-xmos closed 1 year ago

ed-xmos commented 2 years ago

This was found in the I2S slave configuration of XVF3800. The case happens when the I2S master (RPI) stops sending clocks for a while but the MCLK to mic array keeps clocking. This is quite a likely scenario as, for example, the RPI stops sending clocks when not streaming.

I would expect the contents of the MA filters to be noisy when resuming but the application needs greater robustness to the user providing discontinuous I2S clocks especially when connecting to unknown user systems.

The code can be found here: https://github.com/xmos/sw_xvf3800/blob/22dd513ab91c622e9df5c6a63a760e4c2bd09e5f/applications/app_xvf3800/src/data_plane/i2s_task.c#L136

The MA config can be found here: https://github.com/xmos/sw_xvf3800/blob/22dd513ab91c622e9df5c6a63a760e4c2bd09e5f/applications/app_xvf3800/src/app_conf.h#L22

Original issue here. https://github.com/xmos/sw_xvf3800/issues/90

ed-xmos commented 2 years ago

Here is what xgdb has to say about the mic_array thread:

 2  tile[0] core[1]  0x000844ea in pdm_rx_isr ()
* 1  tile[0] core[0]  __xcore_chanend_check_ct (__c=2147615234, __ct=1 '\001') at /Applications/XMOS_XTC_15.1.4/target/include/xcore/_support/xcore_chanend_impl.h:112
(gdb) t 2
[Switching to thread 2 (tile[0] core[1])]#0  0x000844ea in pdm_rx_isr ()
(gdb) disass
Dump of assembler code for function pdm_rx_isr:
0x000844a0 <pdm_rx_isr+0>:  extsp (u6)      0x4
0x000844a2 <pdm_rx_isr+2>:  stw (ru6)       r4, sp[0x0]
0x000844a4 <pdm_rx_isr+4>:  stw (ru6)       r5, sp[0x1]
0x000844a6 <pdm_rx_isr+6>:  stw (ru6)       r6, sp[0x2]
0x000844a8 <pdm_rx_isr+8>:  stw (ru6)       r7, sp[0x3]
0x000844aa <pdm_rx_isr+10>: ldw (lru6)      r4, dp[0x8]
0x000844ae <pdm_rx_isr+14>: in (2r)         r4, res[r4] *
0x000844b0 <pdm_rx_isr+16>: ldw (lru6)      r7, dp[0x9]
0x000844b4 <pdm_rx_isr+20>: ldw (lru6)      r6, dp[0xb]
0x000844b8 <pdm_rx_isr+24>: stw (l3r)       r4, r7[r6]
0x000844bc <pdm_rx_isr+28>: bf (ru6)        r6, 0xa
0x000844be <pdm_rx_isr+30>: sub (2rus)      r6, r6, 0x1
0x000844c0 <pdm_rx_isr+32>: stw (lru6)      r6, dp[0xb]
0x000844c4 <pdm_rx_isr+36>: ldw (ru6)       r4, sp[0x0]
0x000844c6 <pdm_rx_isr+38>: ldw (ru6)       r5, sp[0x1]
0x000844c8 <pdm_rx_isr+40>: ldw (ru6)       r6, sp[0x2]
0x000844ca <pdm_rx_isr+42>: ldw (ru6)       r7, sp[0x3]
0x000844cc <pdm_rx_isr+44>: ldaw (ru6)       sp, sp[0x4]
0x000844ce <pdm_rx_isr+46>: kret (l0r)      
0x000844d2 <pdm_rx_isr+50>: ldw (lru6)      r4, dp[0xc]
0x000844d6 <pdm_rx_isr+54>: stw (lru6)      r4, dp[0xb]
0x000844da <pdm_rx_isr+58>: ldw (lru6)      r4, dp[0xa]
0x000844de <pdm_rx_isr+62>: stw (lru6)      r4, dp[0x9]
0x000844e2 <pdm_rx_isr+66>: stw (lru6)      r7, dp[0xa]
0x000844e6 <pdm_rx_isr+70>: ldw (lru6)      r4, dp[0xd]
0x000844ea <pdm_rx_isr+74>: out (r2r)       res[r4], r7 *
0x000844ec <pdm_rx_isr+76>: ldw (ru6)       r4, sp[0x0]
0x000844ee <pdm_rx_isr+78>: ldw (ru6)       r5, sp[0x1]
0x000844f0 <pdm_rx_isr+80>: ldw (ru6)       r6, sp[0x2]
0x000844f2 <pdm_rx_isr+82>: ldw (ru6)       r7, sp[0x3]
0x000844f4 <pdm_rx_isr+84>: ldaw (ru6)       sp, sp[0x4]
0x000844f6 <pdm_rx_isr+86>: kret (l0r)      
0x000844fa <pdm_rx_isr+90>: stw (2rus)      r0, r0[0x0]
End of assembler dump.
(gdb) info reg
r0             0x80020102   -2147352318
r1             0x87fd8  557016
r2             0x4  4
r3             0x1  1
r4             0x80020502   -2147351294
r5             0x87fdc  557020
r6             0x0  0
r7             0x87988  555400
r8             0xfd8e0  1038560
r9             0x4  4
r10            0x0  0
r11            0x1  1
cp             0x85ff0  548848
dp             0x87388  553864
sp             0xfd8b0  1038512
lr             0x8038a  525194   mic_array::ChannelFrameTransmitter<4u, 1u>::OutputFrame(long (*) [1]) + 14
pc             0x844ea  541930   pdm_rx_isr + 74
sr             0x58 88
spc            0x840d4  540884   ma_frame_tx + 12
ssr            0x142    322
et             0x0  0
ed             0x80300  525056
sed            0x0  0
kep            0x80080  524416
ksp            0x0  0
vec_vsr        0x14 20

So it is blocked on the out but I don't understand why the IN at the other end is blocked..

(gdb) t 8
[Switching to thread 8 (tile[1] core[1] (dual issue))]#0  0x00083834 in ma_frame_rx ()
(gdb) disass
Dump of assembler code for function ma_frame_rx:
0x00083820 <ma_frame_rx+0>: mkmsk (rus)     r11, 0x1
0x00083822 <ma_frame_rx+2>: dualentsp (u6)  0x0
0x00083824 <ma_frame_rx+4>: chkct (2r)      res[r1], r11 *
0x00083826 <ma_frame_rx+6>: nop (0r)        
0x00083828 <ma_frame_rx+8>: mul (l3r)       r2, r3, r2
0x0008382c <ma_frame_rx+12>:    outct (2r)      res[r1], r11 *
0x0008382e <ma_frame_rx+14>:    nop (0r)        
0x00083830 <ma_frame_rx+16>:    bf (lru6)       r2, 0x4
0x00083834 <ma_frame_rx+20>:    in (2r)         r3, res[r1] *
0x00083836 <ma_frame_rx+22>:    nop (0r)        
0x00083838 <ma_frame_rx+24>:    sub (2rus)      r2, r2, 0x1
0x0008383a <ma_frame_rx+26>:    stw (2rus)      r3, r0[0x0]
0x0008383c <ma_frame_rx+28>:    add (2rus)      r0, r0, 0x4
0x0008383e <ma_frame_rx+30>:    nop (0r)        
0x00083840 <ma_frame_rx+32>:    bt (lru6)       r2, -0x4
0x00083844 <ma_frame_rx+36>:    chkct (2r)      res[r1], r11 *
0x00083846 <ma_frame_rx+38>:    nop (0r)        
0x00083848 <ma_frame_rx+40>:    outct (2r)      res[r1], r11 *
0x0008384a <ma_frame_rx+42>:    nop (0r)        
0x0008384c <ma_frame_rx+44>:    nop (0r)        
0x0008384e <ma_frame_rx+46>:    retsp (u6)      0x0
astewart-xmos commented 2 years ago

The deadlock is because the out that it is stuck at isn't the out in ma_frame_tx(), it's the out in pdm_rx_isr.

pdm_rx_isr uses two buffers for capturing PDM samples. When pdm_rx_isr captures enough PDM samples to fill its current buffer, it performs a double buffer pointer swap to switch buffers, and then it outs a pointer (to the buffer just filled) into a streaming channel between the ISR and the mic array thread. As long as the mic array thread is processing blocks of PDM data as fast as they're coming in (i.e. real-time constraint satisfied), the out in pdm_rx_isr is never supposed to block. Because two buffers are used, this is fairly tolerant to brief deviations in processing time.

The problem occurs when the mic array thread gets two full PDM blocks behind. In that case, once the second PDM buffer is filled, pdm_rx_isr issues the out instruction, but it blocks, because the mic array thread has not pulled the previous buffer pointer out of the channel. But then because the out in pdm_rx_isr is blocking, the ISR can never return, which means the mic array thread never regains control, and so it can never issue the corresponding in which would unblock the channel.

I'm not really familiar with the XVF3800 pipeline, but it sounds like the mic array consumer (on tile[1] core[1] in this case) is only calling ma_frame_rx() while the I2S slave is receiving samples. This means the mic array thread is stalled at the out inside ma_frame_tx(), and is unable to retrieve any further PDM blocks from pdm_rx_isr until the consumer has called ma_frame_rx(), but when 2 more blocks of PDM are received while the thread waits at ma_frame_tx(), pdm_rx_isr ends up blocking the thread from ever servicing the ma_frame_rx().

The API contract of the current mic array implementation is that the mic array consumer must retrieve each frame produced by the mic array promptly when it becomes available.

There are a range of solutions here, including:

Which of these will work will depend on some further details:

astewart-xmos commented 2 years ago

I've created a branch on my fork of lib_mic_array that avoids the deadlock. By default the mic array will instead assert rather than deadlock, but there application can tell the mic array that it's okay to just drop blocks of PDM samples if it's backed up.

See here. This can be called while initializing the mic array in app.cpp.

ed-xmos commented 1 year ago

AssertOnDroppedBlock(True/False) is now supported for PDM rx