thesofproject / sof

Sound Open Firmware
Other
560 stars 318 forks source link

[BUG] IMX8 QEMU boot failed - qemu-check.sh regression after qemu commit f71c955 in sof-4.2 branch #4186

Open fredoh9 opened 3 years ago

fredoh9 commented 3 years ago

@marc-hb EDIT: this is a QEMU/IMX regression, Docker is only the messenger. This is blocking any qemu progress like

Describe the bug SOF Docker images were upgraded to Ubuntu 20.04 and proxy-free changes.

after the upgrade, IMX QEMU boot failed

Run ./scripts/docker-qemu.sh ../sof.git/scripts/qemu-check.sh ${PLATFORM}
+ '[' -z 2.0 ']'
+ timeout --foreground 2.0 /home/sof/qemu/xtensa-softmmu/qemu-system-xtensa -cpu imx8 -M adsp_imx8 -nographic -kernel /home/sof/sof.git/build_imx8_gcc/src/arch/xtensa/sof-imx8.ri
qemu-system-xtensa: terminating on signal 15 from pid 16 (timeout)
Error: Process completed with exit code 1.

To Reproduce for new sofqemu docker image : thesofproject/sofqemu:20210511_2 for stable sofqemu docker image : thesofproject/sofqemu:20200422

As I restored back to known stable image, need to use new image to reproduce the problem.

# Use latest sof doker image as this image is great
docker pull thesofproject/sof
docer tag thesofproject/sof sof

# build SOF FWs...

# Use the problematic sofqemu image instead of latest
docker pull thesofproject/sofqemu:20210511_2
docker tag thesofproject/sofqemu:20210511_2 sofqemu

cd SOF_BUILD_ROOT
./scripts/docker-qemu.sh ../sof.git/scripts/qemu-check.sh imx8

Reproduction Rate 10/10, 100%

marc-hb commented 3 years ago

@fredoh9 to save some time extracting them from https://github.com/thesofproject/sof/blob/main/.github/workflows/pull-request.yml and since you just been through them, would you mind adding some reproduction steps to the description, especially the docker commands to switch to the newer image? I mean just from memory, I mean they don't have to be perfectly tested.

Not everyone is familiar with docker and most docker commands take a long time.

cc: @greg-intel

fredoh9 commented 3 years ago

This is my first build of sofqemu image with my setup, last known build is more than year ago, I'm not sure this is any environment setup issue. Same recipe with previous ubuntu 18.04 also fails to me. I need to approach step by step. Need to focus the other task now, will resume later today.

After reading build logs, there was multiple complains saying flex and bison. But surprisingly this is not fatal errors. This is not the root-cause either but will be fixed.

make[1]: Leaving directory '/home/sof/qemu/slirp'
...
make[1]: flex: Command not found
         DEP srcpos.c
         BISON dtc-parser.tab.c
make[1]: bison: Command not found
         LEX dtc-lexer.lex.c
make[1]: flex: Command not found
...
         DEP util.c
         LEX convert-dtsv0-lexer.lex.c
make[1]: flex: Command not found
         BISON dtc-parser.tab.c
make[1]: bison: Command not found
         LEX dtc-lexer.lex.c
make[1]: flex: Command not found
        CHK version_gen.h
         LEX convert-dtsv0-lexer.lex.c
make[1]: flex: Command not found
         BISON dtc-parser.tab.c
make[1]: bison: Command not found
         LEX dtc-lexer.lex.c
make[1]: flex: Command not found
...
  GEN     hw/core/trace.c
make[1]: Entering directory '/home/sof/qemu/slirp'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/sof/qemu/slirp'
        CHK version_gen.h
         LEX convert-dtsv0-lexer.lex.c
make[1]: flex: Command not found
         BISON dtc-parser.tab.c
make[1]: bison: Command not found
         LEX dtc-lexer.lex.c
make[1]: flex: Command not found
  CC      tests/qemu-iotests/socket_scm_helper.o
  LINK    tests/qemu-iotests/socket_scm_helper
fredoh9 commented 3 years ago

Good news and bad news.

Good news is I'm able to build my own thesofproject/sofqemu:20200422 and IMX QEMU test passes. So I don't have any setup/environment issue.

Bad news is there are about 25 commits after 20200422. I need to do git bisect. I will discuss with @lrgirdwo if I find 'bad' commit.

fredoh9 commented 3 years ago

@lrgirdwo From qemu repo, this is 'bad' commit f71c955 hw: mu: Add mu-write implementation

fredoh9 commented 3 years ago

@dbaluta for imx8, looks just return error silently, but imx8x and imx8m throw more logs.

imx8m_error.txt imx8x_error.txt

dbaluta commented 3 years ago

Thanks! I will have look.! @fredoh9 @marc-hb

marc-hb commented 2 years ago

@dbaluta is this still a problem? Fred is out of the office for a while.

dbaluta commented 2 years ago

@marc-hb let's see if we have any bandwidth to look at this in a timeframe of 2 weeks.

If not we can temporarily skip qemu check for imx8.

Cc: @paulstelian97

wenliangwu commented 2 years ago

@dbaluta Sorry to bother you. Is this still a problem? Because .github/workflows/pull-request.yml should be updated with #5274 after the problem is fixed.

paulstelian97 commented 2 years ago

@wenliangwu @dbaluta is currently on leave for roughly a week. We can wait or I could check on it.

wenliangwu commented 2 years ago

@paulstelian97 It's not urgent. I just want to check the status. Thanks.

marc-hb commented 2 years ago

@dbaluta ping?

marc-hb commented 1 year ago

@iuliana-prodan could you take a look at this? I see you just did #6713 in the same area

iuliana-prodan commented 1 year ago

@iuliana-prodan could you take a look at this? I see you just did #6713 in the same area

I don't quite understand what is the problem here? This seems like a very old bug :(

So, with Ubuntu 20.04 Docker image the qemu boot on imx8 is failing? Have I understood correctly?

For example, let's take a newer PR - by looking on the imx8 qemu boot, now this is passing: https://github.com/thesofproject/sof/actions/runs/3623119092/jobs/6108622229 and I see it's running on ubuntu 20.04: Operating System Ubuntu 20.04.5 LTS

So, can you please @marc-hb help me understand what should I investigate here?

Thanks!

cc: @fredoh9

marc-hb commented 1 year ago

The problem is we cannot upgrade QEMU because IMX commit https://github.com/thesofproject/qemu/commit/f71c955096edee3cfa5895e63b9397216f4e3478 broke the IMX tests.

Please ignore Docker: it's only the messenger. I mean literally: do NOT use Docker to reproduce, build and use this branch instead:

https://github.com/thesofproject/qemu/commits/sof-v4.2

The reason tests are passing now is because we're stuck and did not upgrade.

https://github.com/thesofproject/sof/blob/main/scripts/docker_build/sof_qemu/Dockerfile#L55

iuliana-prodan commented 1 year ago

@marc-hb The fix I did the other day was developed and tested with qemu/sof-v5.2.0 and the imx8 boot tests were passing.

https://github.com/thesofproject/sof/blob/main/scripts/docker_build/sof_qemu/Dockerfile#L55

We are now using qemu/sof-4.2 - is this correct? And we want to upgrade to qemu/sof-5.2?

I see that on sof-5.2 we don't have the commit https://github.com/thesofproject/qemu/commit/f71c955096edee3cfa5895e63b9397216f4e3478.

  1. So, should I try qemu/sof-4.2?
  2. Or, add the above commit and test sof-5.2?

If we want to upgrade, probably 2 is better?

Sorry, but I still don't get it how is failing on sof-4.2 (the branch we are using right now), but I see in CI that the tests are passed. I think I'm missing something :(

marc-hb commented 1 year ago

Sorry, but I still don't get it how is failing on sof-4.2 (the branch we are using right now),

I believe CI is using a docker image with NOT THE LATEST sof-v4.2 qemu commit right now. @fredoh9 can you confirm? Hence this bug.

@iuliana-prodan can you test the latest sof-v4.2 QEMU commit? without Docker.

Maybe we should switch the docker image and CI to the sof-v5.2 qemu branch?

marc-hb commented 1 year ago

I believe CI is using a docker image with...

We should not be spending time wondering about this, each Docker image should have a clear record of QEMU versions in itself. The other Docker image already has that, @fredoh9 added it in commit 9fc3833a5deb9b47ab. Unfortunately the QEMU image does not have it yet.

iuliana-prodan commented 1 year ago

@iuliana-prodan can you test the latest sof-v4.2 QEMU commit? without Docker.

How to run it without docker?

Here are the steps I'm following to test imx boot on qemu (these are taken from CI):

  1. uses: actions/checkout@v2 with: {fetch-depth: 0, submodules: recursive}
  2. name: turn off HAVE_AGENT run: echo CONFIG_HAVE_AGENT=n > src/arch/xtensa/configs/override/no-agent.config
  3. name: docker SOF run: docker pull thesofproject/sof && docker tag thesofproject/sof sof
  4. name: xtensa-build-all.sh -o no-agent platforms env: PLATFORM: ${{ matrix.platform }} run: ./scripts/docker-run.sh ./scripts/xtensa-build-all.sh -o no-agent -r ${PLATFORM}
  5. name: docker QEMU run: docker pull thesofproject/sofqemu && docker tag thesofproject/sofqemu sofqemu
  6. name: qemu-check env: PLATFORM: ${{ matrix.platform }} run: ./scripts/docker-qemu.sh ../sof.git/scripts/qemu-check.sh ${PLATFORM}

So, on step 4, should I use a .ri image built locally, not on docker?

marc-hb commented 1 year ago

There are two Docker images. You can still build the .ri file with the thesofproject/sof image, that image is irrelevant here.

The image that you must NOT use in order to test different qemu branches and versions is thesofproject/sofqemu sofqemu. Do not use the script ./scripts/docker-qemu.sh, invoke ./sof.git/scripts/qemu-check.sh directly. This requires cloning and building some specific qemu branch first.

Earlier you wrote:

The fix I did the other day was developed and tested with qemu/sof-v5.2.0

For that you did NOT use ./scripts/docker-qemu.sh, correct?

iuliana-prodan commented 1 year ago

I reproduced the issue on qemu/sof-v4.2.

With commit https://github.com/thesofproject/qemu/commit/f71c955096edee3cfa5895e63b9397216f4e3478 I'm getting:

nxa06898@lsv15040:~/work/qemu$ git branch 
* sof-v4.2
nxa06898@lsv15040:~/work/qemu$ git log --oneline | head
f71c955096 hw: mu: Add mu-write implementation
d1234bca66 xtensa-host.sh: log the full command including the timeout prefix
1ac98caf30 qemu: io-bridge: remove fd assignment after its closed
ece0302801 Add shebangs
7c8c197474 xtensa-host.sh: replace . with ${MY_DIR} to be called from anywhere
27069022b5 xtensa-host.sh: add a ROM file sanity check and set -e
072572c305 ssi: esai: Add esai driver support
71fb77e9a1 ssi: sai: Add sai driver support
4887a712db xtensa: add debug trace file.
52b64a332a adsp: xtensa: Add support for Zephyr crash dumps.
nxa06898@lsv15040:~/work/qemu$ ../sof/scripts/qemu-check.sh imx8
+ ./xtensa-host.sh imx8 -k /home/nxa06898/work/sof/build_imx8_gcc/src/arch/xtensa/sof-imx8.ri '' -o 2.0 /home/nxa06898/work/sof/dump-imx8.txt
+ '[' -z 2.0 ']'
+ timeout --foreground 2.0 /home/nxa06898/work/qemu/build/xtensa-softmmu/qemu-system-xtensa -cpu imx8 -M adsp_imx8 -nographic -kernel /home/nxa06898/work/sof/build_imx8_gcc/src/arch/xtensa/sof-imx8.ri
qemu-system-xtensa: terminating on signal 15 from pid 9031 (timeout)
hexdump: /dev/shm/: Is a directory
Error ipc reg failed
Error boot failed
bridge-io: qemu-bridge-iram-mem fd 9 region 1 at 0x7fdaacb72000 allocated 32768 bytes
bridge-io: qemu-bridge-dram-mem fd 10 region 2 at 0x7fdaacb6a000 allocated 32768 bytes
bridge-io: qemu-bridge-sdram0-mem fd 11 region 3 at 0x7fda98a23000 allocated 8388608 bytes
bridge-io: qemu-bridge-sdram1-mem fd 12 region 4 at 0x7fda98223000 allocated 8388608 bytes
bridge-io: qemu-bridge-mbox-io fd 13 region 5 at 0x7fdaacb65000 allocated 20480 bytes
bridge-io: qemu-bridge-mu_13a-io fd 14 region 6 at 0x7fdaacb55000 allocated 65536 bytes
bridge-io: qemu-bridge-mu_13b-io fd 15 region 7 at 0x7fdaac9ca000 allocated 65536 bytes
bridge-io: qemu-bridge-irqstr-io fd 16 region 8 at 0x7fdaacb54000 allocated 204 bytes
bridge-io: qemu-bridge-edma-io fd 17 region 9 at 0x7fdaac9ba000 allocated 65536 bytes
bridge-io: qemu-bridge-sai-io fd 18 region 10 at 0x7fdaac9aa000 allocated 65536 bytes
bridge-io: qemu-bridge-esai-io fd 19 region 11 at 0x7fdaac99a000 allocated 65536 bytes
bridge-io-mq: added /qemu-io-parent-i.MX8
bridge-io-mq: added /qemu-io-child-i.MX8
bridge-io: 0 messages are currently on child queue.
header size=0x30ab0 modules=0x1 abi=0x1 size=16
new module size 0x30aa4 blocks 0xf type 0x0
data: 0x596f8000 size 0x140
block 0 type 0x1 size 0x140 ==>  offset 0x10000
data: 0x596f8400 size 0x16c
block 1 type 0x1 size 0x16c ==>  offset 0x10400
data: 0x596f857c size 0x8
block 2 type 0x1 size 0x8 ==>  offset 0x1057c
data: 0x596f859c size 0x8
block 3 type 0x1 size 0x8 ==>  offset 0x1059c
data: 0x596f85bc size 0x8
block 4 type 0x1 size 0x8 ==>  offset 0x105bc
data: 0x596f85dc size 0x4
block 5 type 0x1 size 0x4 ==>  offset 0x105dc
data: 0x596f85fc size 0x8
block 6 type 0x1 size 0x8 ==>  offset 0x105fc
data: 0x596f8618 size 0x4
block 7 type 0x1 size 0x4 ==>  offset 0x10618
data: 0x596f861c size 0x18
block 8 type 0x1 size 0x18 ==>  offset 0x1061c
data: 0x596f863c size 0x8
block 9 type 0x1 size 0x8 ==>  offset 0x1063c
data: 0x92400000 size 0x93e4
block 10 type 0x3 size 0x93e4 ==>  offset 0x0
data: 0x924093e4 size 0x3c
block 11 type 0x3 size 0x3c ==>  offset 0x93e4
data: 0x92409440 size 0x26580
block 12 type 0x3 size 0x26580 ==>  offset 0x9440
data: 0x9242f9c0 size 0xcf0
block 13 type 0x3 size 0xcf0 ==>  offset 0x2f9c0
data: 0x924737f0 size 0x6c
block 14 type 0x3 size 0x6c ==>  offset 0x737f0
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) bridge-io: msg send: 0 type 2 msg 64 size 16 ret 0
nxa06898@lsv15040:~/work/qemu$ 

Without commit https://github.com/thesofproject/qemu/commit/f71c955096edee3cfa5895e63b9397216f4e3478 the boot test is passing:

nxa06898@lsv15040:~/work/qemu$ git branch 
* sof-v4.2
nxa06898@lsv15040:~/work/qemu$ git log --oneline | head
d1234bca66 xtensa-host.sh: log the full command including the timeout prefix
1ac98caf30 qemu: io-bridge: remove fd assignment after its closed
ece0302801 Add shebangs
7c8c197474 xtensa-host.sh: replace . with ${MY_DIR} to be called from anywhere
27069022b5 xtensa-host.sh: add a ROM file sanity check and set -e
072572c305 ssi: esai: Add esai driver support
71fb77e9a1 ssi: sai: Add sai driver support
4887a712db xtensa: add debug trace file.
52b64a332a adsp: xtensa: Add support for Zephyr crash dumps.
1722710392 Update version for v4.2.0 release
nxa06898@lsv15040:~/work/qemu$ ../sof/scripts/qemu-check.sh imx8
+ ./xtensa-host.sh imx8 -k /home/nxa06898/work/sof/build_imx8_gcc/src/arch/xtensa/sof-imx8.ri '' -o 2.0 /home/nxa06898/work/sof/dump-imx8.txt
+ '[' -z 2.0 ']'
+ timeout --foreground 2.0 /home/nxa06898/work/qemu/build/xtensa-softmmu/qemu-system-xtensa -cpu imx8 -M adsp_imx8 -nographic -kernel /home/nxa06898/work/sof/build_imx8_gcc/src/arch/xtensa/sof-imx8.ri
qemu-system-xtensa: terminating on signal 15 from pid 32091 (timeout)
ipc reg dump:
00000020  00 00 00 c0 00 00 04 c0  00 00 00 00 00 00 00 00  |................|
ipc message dump:
00000000  6c 00 00 00 00 00 00 70  00 00 00 00 00 00 00 00  |l......p........|
00000010  00 00 00 00 00 00 00 00  3c 00 00 00 02 00 00 00  |........<.......|
00000020  00 00 ff ff 64 74 65 72  6d 69 6e 2e 00 00 00 00  |....dtermin.....|
00000030  66 77 72 65 61 64 79 2e  00 00 34 61 39 61 39 00  |fwready...4a9a9.|
00000040  00 a0 01 03 6a 5a eb 2d  00 00 00 00 00 00 00 00  |....jZ.-........|
Boot success

I'll add this bug on my todo list and I'll look into it.

marc-hb commented 1 year ago

@iuliana-prodan any news?

marc-hb commented 1 year ago

@iuliana-prodan , ping?