xmos / lib_xud

XMOS USB code and associated examples
Other
8 stars 25 forks source link

Memory access exception in Pid_Out() at ./included/XUD_Token_Out_DI.S:13 #363

Closed vdpsr closed 1 year ago

vdpsr commented 1 year ago

Hello,

With @darasan We are currently developping an audio product based on XU232-512-FB374-C40 chip & sw_usb_audio. We recently moved from old xud implementation (based on old sc_xud (v2.4.1rc0), sc_usb (v1.0.4rc1), sc_usb_device (v1.3.8rc0)) to lib_xud. This move seems to have fixed some usb stability issues we've been facing (bad enumerations, interface disapearing from time to time etc.). But this introduced a regression, the device is crashing when switching sample rate.

With lib_xud release v2.2.0 we had the same crash as danielpieczko in issue #356 when changing sample rate. Then we integrated the fix pushed on develop and the device is still crashing when switching SR but with the following error :

Program received signal ET_LOAD_STORE, Memory access exception.
[Switching to tile[0] core[4]]
Pid_Out () at ./included/XUD_Token_Out_DI.S:13
13      in ./included/XUD_Token_Out_DI.S

(gdb) info stack
#0  Pid_Out () at ./included/XUD_Token_Out_DI.S:13
#1  0x00046b80 in OutReady () at ./included/XUD_Token_Out_DI.S:20
#2  0x00046b80 in OutReady () at ./included/XUD_Token_Out_DI.S:20
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) info registers
r0             0x80100  524544
r1             0x10500  66816
r2             0x80000  524288
r3             0x1e     30
r4             0x0      0
r5             0x4f318  324376
r6             0x3334   13108
r7             0xf335   62261
r8             0x493e0  300000
r9             0xa001   40961
r10            0x9b     155
r11            0x1      1
cp             0x48d58  298328
dp             0x493c8  299976
sp             0x7f188  520584
lr             0x46b80  289664   OutReady + 4
pc             0x46b70  289648   Pid_Out + 24
sr             0x90     144
spc            0x46b70  289648   Pid_Out + 24
ssr            0x183    387
et             0x5      5
ed             0x2a     42
sed            0x10400  66560
kep            0x40080  262272
ksp            0x400a4  262308

We are running both Windows 10 & 11 with Theysicon driver V4.67.0 for our product.

The fact that is weird is that the crash only occurs when playing audio, when the audio is paused, the device is fine when switching SR

Do you have any idea what is causing this crash & how to fix it ? I will remain available if you need more info on this.

Here is the captures I did with an ellisys USB analyser :

sr_change_44to48_error.zip

Thank you for your time !

Valentin

xross commented 1 year ago

Thanks for reporting the issue. Have you significantly modified the USB Audio design? In the trace you attach I don't see any transactions to the audio feedback endpoint? Can you provide a full trace including enumeration? Looks like a fairly large channel count device?

From the trace it appears the device fails during streaming rather than at the time of sample rate change. At time 3.074 623 700 the device starts to not respond to host audio IN requests. Any control interaction from the host doesn't appear until 3.086 002 517 (clear halt) - presumably as a result to the device not responding to requests on that endpoint for some time.

I also note at this time the device only sends 0 data back to the host - is your external ADC hardware setup correctly?

xross commented 1 year ago

At line https://github.com/xmos/lib_xud/blob/f3cca425e3582b3af856d0a68a7a1a1b7d91287d/lib_xud/src/core/included/XUD_Token_Out_DI.S#L13 R3 is clearly not a memory and so the execution is raised. R10 should the the endpoint number but has a value 0x9b (155) clearly out of range for an EP. In XS2 this value is read directly from a the RX data port (https://github.com/xmos/lib_xud/blob/f3cca425e3582b3af856d0a68a7a1a1b7d91287d/lib_xud/src/core/XUD_CrcAddrCheck.S#L31).

What speed are you running the device at and how many cores on running on the USB tile?

vdpsr commented 1 year ago

Thanks for reporting the issue. Have you significantly modified the USB Audio design?

Thank you for helping on that ! Yes the reference design have been tweaked in the first place to fit our needs in previous products, this one is based on that.

In the trace you attach I don't see any transactions to the audio feedback endpoint?

We are not using explicit endpoint for the feedback but implicit feedback when input stream is available. Maybe we are doing it wrong ?

Can you provide a full trace including enumeration?

Yes here is a trace of the device enumerating, then attached to my DAW & streaming audio @44.1k enum_audio_stream_44.zip

Looks like a fairly large channel count device?

Yes, the goal is to have the following number of channels :

From the trace it appears the device fails during streaming rather than at the time of sample rate change. At time 3.074 623 700 the device starts to not respond to host audio IN requests. Any control interaction from the host doesn't appear until 3.086 002 517 (clear halt) - presumably as a result to the device not responding to requests on that endpoint for some time.

In the 'sr_change_44to48_error' frame the time of the SR change is 2.851 800 733 and then it starts to stream audio like before at new sample rate but it fails at 3.074 623 700 for no reason. This is really weird because it happens only when we play audio during sr change on the 2 first USB channels (otherwise the sample rate change is OK). I will send you a frame of the exact same case of changing sample rate from 44.1 to 48k but without playing audio & you will be able to see that it runs fine.

I also note at this time the device only sends 0 data back to the host - is your external ADC hardware setup correctly?

Yes I wanted to be in the simplest config & focus on the outputs

vdpsr commented 1 year ago

Here is a trace when changing sample rate is OK while streaming audio on USB outputs from 3 to 10 :

sr_change_44to48_ok.zip

xross commented 1 year ago

Thanks for the additional information. Can you confirm device clock frequency and count count on USB tile please?

vdpsr commented 1 year ago

Hello Ross,

The clock frequency is 500Mhz and 5 threads on USB tile (which is tile 0).

I may have found a race condition related to the sample rate change on my side (linked to the way we configure DACs & ADCs), I will dig more into it today & tomorrow, I let you know if I found a solution !

Thanks again for your help :)

xross commented 1 year ago

We are not using explicit endpoint for the feedback but implicit feedback when input stream is available. Maybe we are doing it wrong ?

No, that is fine and the preferred mode of operation so long as you don't need the device to work with built in Windows drivers.

xross commented 1 year ago

Hello Ross,

The clock frequency is 500Mhz and 5 threads on USB tile (which is tile 0).

I may have found a race condition related to the sample rate change on my side (linked to the way we configure DACs & ADCs), I will dig more into it today & tomorrow, I let you know if I found a solution !

Thanks again for your help :)

Its hard to see how the exception you encounter can be caused by such an issue (bar any memory corruption issues) but please do let us know.

xross commented 1 year ago

Is this still an issue please?

vdpsr commented 1 year ago

Hello,

Sorry I've been away from office since a few days, we still have the problem on MacOS yes, I was about to reach you again on this topic, I will prepare a summary to let you know how it is going recently

Thank you for your help !

vdpsr commented 1 year ago

Hello Ross,

Finally we were able to solve the bug on our end. It turned out that we had deviated from the reference design in the way we handled outUnderflow in the handle_audio_request function of the decouple thread.

In my opinion, the pointers handling the audio buffer were no longer being used correctly which caused the exceptions described.

It took a while to find this out as we followed some wrong paths before we realised that we were not following the reference design in this part of the code.

A bit of housekeeping later and the frequency change is much smoother and we don't encounter any more crashes.

Thanks for your help and patience, we are very happy to see the xmos libraries being maintained through these github repositories !

Best regards

Valentin

xross commented 1 year ago

Thanks for the feedback Valentin!