Open crazoes opened 4 years ago
@dianacretu Do you have any tips? I can't seem to find the proper documentation?
@dianacretu @paulstelian97 not sure we can help with that because we only looked at DSP emulation so far. I remember that some time ago when I looked at running the emulation on Intel platforms I got stuck at the same problem.
@ranj063 @lgirdwood should we try with other Intel platform? What is the most stable platform to experiment with qemu?
Is there any alternate setup for development and testing? Do you have a different setup that doesn't require qemu @paulstelian97 ?
I've spent past week trying to complete setup and start with the my outreachy project.
I've seen @ktrzcinx working on improving debugging as well. Maybe he can tell us what setup he has for development?
@crazoes I generally test on actual boards, and we have at NXP an internal setup that allows remote access to time shared boards on which I can run an arbitrary kernel + dtb (ARM) + firmware combinations for testing. I'm not sure that you have this option though, that's why qemu is recommended. And I tagged Diana because she is adding support for the NXP specific hardware in qemu.
Well, as usual I'm also using company internal setup for build and testing. Sometimes for tests on full stack I use real hardware. So I'm not a QEMU user actually. It's rather question for folks who maintain CI.
@crazoes your flow looks corrct, but sometime we have seen issues with stale posix SHM and MQs in the past. Can you try and delete /dev/shm/qemu- and /dev/mqueue/qemu-
You should be able to confirm that the correct IPC registers and memories are correct via checking the SHM e.g.
hexdump -c /dev/shm/qemu-<name of memory> | less
Interrupts (and reset control) are sent via POSIX message queue, so the FW should send a "FW ready" message IRQ to the host as a MQ (I can see this in the log), and it looks like the host does get the message.
It does look like the ACPI/PCI IRQ number being used by the virtualised host driver is wrong. I have seen this before with different qemu releases. dmesg may show "irq: nobody cared" message as the code will emit an IRQ or the IRQ will be handled by another driver (with nothing in dmesg).
cat /proc/interrupts
Will show the current IRQs for each device, I would assume the count for one driver will increment by one when the DSP has booted. This will need to be mapped to the DSP by changing the value in qemu ACPI table code.
@crazoes I have tried your instructions on my machine and it seems that the firmware can boot correctly.
Here is the log for sof that I get:
[ 0.176321] Performance Events: unsupported p6 CPU model 15 no PMU driver, software events only.
[ 2.016177] sof-audio-acpi 80860F28:00: ACPI DSP detected
[ 2.016903] sof-audio-acpi 80860F28:00: IMR not set by BIOS. Ignoring
[ 2.018056] sof-audio-acpi 80860F28:00: Force to use nocodec mode
[ 2.019292] sof-audio-acpi 80860F28:00: Using nocodec machine driver
[ 2.021236] sof-audio-acpi 80860F28:00: loading firmware
[ 2.022901] sof-audio-acpi 80860F28:00: request_firmware intel/sof/sof-byt.ri successful
[ 2.024413] sof-audio-acpi 80860F28:00: firmware doesn't contain extended manifest
[ 2.025863] sof-audio-acpi 80860F28:00: header size=0x14fcc modules=0x1 abi=0x1 size=16
[ 2.030774] sof-audio-acpi 80860F28:00: new module size 0x14fc0 blocks 0xf type 0x0
[ 2.031429] sof-audio-acpi 80860F28:00: block 0 type 0x1 size 0x16c ==> offset 0xc0000
[ 2.032043] sof-audio-acpi 80860F28:00: block 1 type 0x1 size 0x16c ==> offset 0xc0400
[ 2.032628] sof-audio-acpi 80860F28:00: block 2 type 0x1 size 0x8 ==> offset 0xc05bc
[ 2.033168] sof-audio-acpi 80860F28:00: block 3 type 0x1 size 0x8 ==> offset 0xc05dc
[ 2.033710] sof-audio-acpi 80860F28:00: block 4 type 0x1 size 0x4 ==> offset 0xc05fc
[ 2.034248] sof-audio-acpi 80860F28:00: block 5 type 0x1 size 0x4 ==> offset 0xc061c
[ 2.034827] sof-audio-acpi 80860F28:00: block 6 type 0x1 size 0x8 ==> offset 0xc063c
[ 2.035369] sof-audio-acpi 80860F28:00: block 7 type 0x1 size 0x4 ==> offset 0xc0658
[ 2.035909] sof-audio-acpi 80860F28:00: block 8 type 0x1 size 0x18 ==> offset 0xc065c
[ 2.036509] sof-audio-acpi 80860F28:00: block 9 type 0x1 size 0x8 ==> offset 0xc067c
[ 2.037117] sof-audio-acpi 80860F28:00: block 10 type 0x1 size 0x11810 ==> offset 0xc06c0
[ 2.038194] sof-audio-acpi 80860F28:00: block 11 type 0x2 size 0x21c8 ==> offset 0x100008
[ 2.038770] sof-audio-acpi 80860F28:00: block 12 type 0x2 size 0x1c ==> offset 0x1021d0
[ 2.039341] sof-audio-acpi 80860F28:00: block 13 type 0x2 size 0x1068 ==> offset 0x1021f0
[ 2.039895] sof-audio-acpi 80860F28:00: block 14 type 0x2 size 0x194 ==> offset 0x1070b8
[ 2.040479] xxx: snd_sof_run_firmware ..
[ 2.049362] sof-audio-acpi 80860F28:00: booting DSP firmware
[ 2.076794] sof-audio-acpi 80860F28:00: ipc: DSP is ready 0x70000000 offset 0x144000
[ 2.077429] sof-audio-acpi 80860F28:00: Firmware info: version 1:5:0-e9cb8
[ 2.077888] sof-audio-acpi 80860F28:00: Firmware: ABI 3:16:0 Kernel ABI 3:16:0
[ 2.078524] sof-audio-acpi 80860F28:00: found ext header type 1 size 0xa0
[ 2.078982] sof-audio-acpi 80860F28:00: found ext header type 2 size 0x5c
[ 2.079389] sof-audio-acpi 80860F28:00: Firmware info: used compiler XCC 12:0:8 xtensa-byt-elf used optimization flags Os
[ 2.080038] sof-audio-acpi 80860F28:00: found ext header type 3 size 0x1c
[ 2.080444] sof-audio-acpi 80860F28:00: warning: unknown ext header type 3 size 0x1c
[ 2.081009] sof-audio-acpi 80860F28:00: found ext header type 4 size 0x10
[ 2.081415] sof-audio-acpi 80860F28:00: warning: unknown ext header type 4 size 0x10
[ 2.082017] sof-audio-acpi 80860F28:00: mailbox upstream 0x144000 - size 0x400
[ 2.082554] sof-audio-acpi 80860F28:00: mailbox downstream 0x144400 - size 0x400
[ 2.083100] sof-audio-acpi 80860F28:00: stream region 0x144a00 - size 0x200
[ 2.083565] sof-audio-acpi 80860F28:00: firmware boot complete
[ 2.588429] sof-audio-acpi 80860F28:00: error: ipc timed out for 0x90030000 size 80
[ 2.590084] sof-audio-acpi 80860F28:00: info: preventing DSP entering D3 state to preserve context
[ 2.592416] sof-audio-acpi 80860F28:00: error: unexpected fault 0x70028800 trace 0x00000000
[ 2.594100] sof-audio-acpi 80860F28:00: error: ipc host -> DSP: pending no complete no raw 0x4000000000000000
[ 2.595890] sof-audio-acpi 80860F28:00: error: mask host: pending no complete no raw 0x0
[ 2.597527] sof-audio-acpi 80860F28:00: error: ipc DSP -> host: pending no complete yes raw 0x70028800
[ 2.598855] sof-audio-acpi 80860F28:00: error: mask DSP: pending no complete no raw 0x0
[ 2.599401] sof-audio-acpi 80860F28:00: error: can't set params for DMA for trace -110
[ 2.599944] sof-audio-acpi 80860F28:00: warning: failed to initialize trace -110
[ 2.600535] sof-nocodec sof-nocodec: info: override BE DAI link NoCodec-0
[ 2.600947] sof-nocodec sof-nocodec: info: override BE DAI link NoCodec-1
[ 2.601354] sof-nocodec sof-nocodec: info: override BE DAI link NoCodec-2
[ 2.602190] sof-audio-acpi 80860F28:00: Topology: ABI 3:16:0 Kernel ABI 3:16:0
[ 2.602784] sof-audio-acpi 80860F28:00: warning: widget type 1 name codec_out0 not handled
[ 3.108431] sof-audio-acpi 80860F28:00: error: ipc timed out for 0x30010000 size 76
[ 3.110091] sof-audio-acpi 80860F28:00: info: preventing DSP entering D3 state to preserve context
[ 3.112384] sof-audio-acpi 80860F28:00: error: unexpected fault 0x70028800 trace 0x00000000
[ 3.114018] sof-audio-acpi 80860F28:00: error: ipc host -> DSP: pending no complete no raw 0x4000000000000000
[ 3.115814] sof-audio-acpi 80860F28:00: error: mask host: pending no complete no raw 0x0
[ 3.117465] sof-audio-acpi 80860F28:00: error: ipc DSP -> host: pending no complete yes raw 0x70028800
[ 3.118723] sof-audio-acpi 80860F28:00: error: mask DSP: pending no complete no raw 0x0
[ 3.119264] sof-audio-acpi 80860F28:00: error: DSP failed to add widget id 0 type 23 name : PCM0P stream Low Latency Playback 0 reply 0
[ 3.119919] sof-audio-acpi 80860F28:00: ASoC: failed to load widget PCM0P
[ 3.120322] sof-audio-acpi 80860F28:00: error: tplg component load failed -110
[ 3.120847] sof-audio-acpi 80860F28:00: error: failed to load DSP topology -22
[ 3.121370] sof-audio-acpi 80860F28:00: ASoC: failed to probe component -22
[ 3.121788] sof-nocodec sof-nocodec: ASoC: failed to instantiate card -22
[ 3.122225] sof-nocodec: probe of sof-nocodec failed with error -22
I do get an error when trying to load the topology, which I'm now investigating. Not sure what could be the problem on your side. Do you have the kvm module installed?
@crazoes OK, so i did more research. It seems that I can reproduce your exact trace when CONFIG_TRACEM is selected when compiling the firmware.
So, we have now two ways of continuing:
1) With CONFIG_TRACEM selected I think the DSP crashes somewhere. Maybe some memory allocation is not enough. 1) Without CONFIG_TRACEM selected there is a problem with loading the topology. But what I think it happens is that SOF firmware (sof-byt.ri) crashes somewhere. :)
I think this is a good start for investigation.
@dbaluta strange but I don't have CONFIG_TRACEM set. I am still not able to load the firmware ( not sure why it is loading for you ). How are you compiling your firmware? Using script or manually?
@crazoes first time i compile the .ri using the script.
Then to modify CONFIG_TRACEM in sof repo I use make menuconfig, and from now on I only run 'make -j8' to compile the firmware.
So, can you try both with CONFIG_TRACEM set and unset to see what's the result? Also, make sure you have kvm enabled.
@dbaluta I do have kvm enabled as per this line https://github.com/thesofproject/qemu/blob/sof-v4.2/x86-host.sh#L79
Also, I tried to set and unset CONFIG_TRACEM but got the same result both the time. No change in the output. I can still see firmware boot failure.
I do the same process as you did to compile the firmware ( first using script then make menuconfig ) If I am not wrong, CONFIG_TRACEM is the Trace mailbox.
@crazoes Ok, thanks. I couldn't figure what's the issue. So lets take it step by step. Can you first try to see if SOF firmware boots with the fuzzer? I have forwarded you an email on how to start the fuzzer.
@crazoes please try and run the xtensa-host.sh with -t to debug boot issues. You can also use the tracing script too, please apply this update
diff --git a/xtensa-trace.sh b/xtensa-trace.sh
index 78fcc0fcf5..c1df905044 100755
--- a/xtensa-trace.sh
+++ b/xtensa-trace.sh
@@ -9,12 +9,38 @@
#
# 2) Now run this script.
#
+# ./xtensa-trace.sh zephyr.lst [bootloader.lst] [rom.lst]
+#
# Find all xtensa function "entry" instructions.
-grep entry dump.txt > entry.txt
+grep 'entry\|retw' dump.txt > entry.txt
+cat $1 $2 $3 > dis.txt
+
+let indent=1
# Now find the function name for each "entry" instruction address.
while read input
do
- grep `echo $input | cut -f1 -d: | cut -c 3-10` /home/lrg/work/zephyrproject/zephyr/build/zephyr/zephyr.lst | grep -v call | grep -v entry -m 1
+ #echo -n "searching for function at "
+ #echo $input | cut -f1 -d: | cut -c 3-10
+
+ for (( c=1; c<=indent; c++ ))
+ do
+ echo -n " "
+ done
+
+ #echo "found"
+ #grep `echo $input | cut -f1 -d: | cut -c 3-10` dis.txt | grep -v call | grep -v entry -m 1
+ grep `echo $input | cut -f1 -d: | cut -c 3-10` dis.txt | grep '>:'
+ if [ $? -eq 0 ];
+ then
+ let indent++
+ fi
+
+ grep `echo $input | cut -f1 -d: | cut -c 3-10` dis.txt | grep 'retw'
+ if [ $? -eq 0 ];
+ then
+ let indent--
+ fi
+
done < entry.txt
@lgirdwood There are couple of things that we tried to solve the boot problem. I would like to list them here for your information
Increasing the boot timeout value under sound/soc/sof/loader.c
sdev->boot_timeout = 10000;
We thought maybe dsp doesn't get enough of booting time so we tried to increase this value but nothing changed with it.
Printing the IRQ number ( byt_acpi_probe function inside sound/soc/sof/intel/byt.c file)
dev_info(sdev->dev, "using IRQ %d %d\n", sdev->ipc_irq, desc->irqindex_host_ipc);
Following is the image of dmesg output for it.
Also, please see the below image for cat /proc/interrupts output.
We don't even get any irq: nobody cared message as well.
@dbaluta, Please add if I am forgetting something to mention here.
I'll send you the output that I get using -t option with xtensa-host.sh soon.
@lgirdwood @crazoes I don't think there is any problem with Qemu implementation. On my side, I successfully booted the DSP but then got a timeout when loading the topology.
Lets try the debug script @lgirdwood mentioned.
@crazoes I would expect to see the IRQ count increment, the mapping looks correct, but it may be that qemu APIs have been updated wrt IRQs on x86, maybe we are missing something. Can you check the other PCI qemu drivers for IRQ logic. @dbaluta the SHM memory dumps will also help here, you could divert all trace to the "trace error" region (no DMA) and then use sof-logger to read the SHM file, but I agree emulator is far slower so may timeout.
@dbaluta I tried the script given by Liam.
dump.txt :- https://gist.github.com/crazoes/d6115a0a8f655d0f2f64826f3b4138a1 entry.txt :- https://gist.github.com/crazoes/ada9c73237058cf9e97e380acfc7436d
I don't really understand much so will need your help with it.
@crazoes dump.txt looks OK. It shows the firmware idle waiting for an interrupt. Lets debug this together in our weekly meeting.
Hi @dbaluta and @crazoes : Was this issue solved and the process documented somewhere? I had the same problem a few months ago, but didn't went into further details as my project didn't need this.
Fwiw @lyakh will be updating the qemu docs here as he is debugging some Zephyr bringup.
@ymdatta we were not really able to solve this problem. We suspected some of the qemu APIs might have changed over these years and hence communication was not happening correctly.
Steps undertaken:
Build toolchain
Following instructions on https://thesofproject.github.io/latest/getting_started/build-guide/build-from-scratch.html#id9
Build firmware
Build qemu
Build kernel (sof's fork)
Build rootfs image
Booting xtensa-host
Booting host
Error / Problem
As per alsa wiki successful boot should look something like this:
However, what I see on xtensa-host and x86-host are as follows:
On the host, I can see the following error messages when grepped on dmesg
If anyone can suggest how to solve this or if you spot any errors in the process that'd be great.