ilg-archived / qemu

The GNU MCU Eclipse QEMU
http://gnuarmeclipse.github.io/qemu/
Other
204 stars 78 forks source link

Unable to connect to gdbserver when QEmu started with -nographic #70

Closed Novakov closed 5 years ago

Novakov commented 5 years ago

Description

arm-none-eabi-gdb unable to connect when running qemu with -nographic option.

Steps to Reproduce

  1. Run QEmu with gdbserver: qemu-system-gnuarmeclipse.exe --verbose -verbose -verbose --board NUCLEO-F072RB --mcu STM32F072RB -s -d unimp,guest_errors -nographic
  2. Run arm-none-eabi-gdb
  3. In GDB try to connect to QEmu: target remote localhost:1234

Expected behaviour: GDB is connected to target

Actual behaviour: GDB timeouts and reports invalid responses:

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
Ignoring packet error, continuing...
warning: unrecognized item "timeout" in "qSupported" response
Remote replied unexpectedly to 'vMustReplyEmpty': PacketSize=1000;qXfer:features:read+

Versions

Random observations:

not-working.txt working.txt

ilg-ul commented 5 years ago

thank you, I'll investigate, but it'll take some time, I'm in the middle of upgrading my machines, and things don't go always smoothly.

zaq32 commented 5 years ago

I have been looking into this problem and found out that this is strictly windows specific problem. When qemu is started with -S and -nographic options then once it initially goes into main loop after just a few iterations it goes into infinite wait on the qemu_poll_ns call here: https://github.com/gnu-mcu-eclipse/qemu/blob/gnuarmeclipse-dev/main-loop.c#L459 When connection to gdbserer is made, the notification about it is not detected and processed at all because the event that is passed to g_source_add_poll (here: https://github.com/gnu-mcu-eclipse/qemu/blob/gnuarmeclipse-dev/io/channel-watch.c#L302) is not associated with the socket channel's listening socket. That makes it effectively standalone unused event that will not be signaled by the operating system once the socket it is supposed to monitor is ready to be processed (due to whatever reason).

This event will be subsequently reassociated with its corresponding socket at during first g_main_context_dispatch call which verifies the state of all handles in the poll set regardless whether it was signaled or not. This allows the system to recover but it introduces non deterministic delay between an actual event and the time when the next g_main_context_dispatch is run.

This is especially problematic when qemu is started in halt mode (-S) in which case after initial event polling main loop goes into infinite wait, waiting for connection to gdb server (or something similar to happen). The connection to gdb server however will not wake up the main loop due to the fact that event that is supposed to be monitoring the gdbstub socket is not properly configured. From this system will not be able to recover unless some other event gets fired during which state of the gdb server socket gets checked and all of the remaining pending events handled. As part of the socket event handling the event is reassociated with socket again (see https://github.com/gnu-mcu-eclipse/qemu/blob/gnuarmeclipse-dev/io/channel-socket.c#L665).

Therefore I believe that fix for this issue is to add following snippet

        WSAEventSelect(sioc->fd, ioc->event,
                       FD_READ | FD_ACCEPT | FD_CLOSE |
                       FD_CONNECT | FD_WRITE | FD_OOB);

to the qio_channel_create_socket_watch function https://github.com/gnu-mcu-eclipse/qemu/blob/gnuarmeclipse-dev/io/channel-watch.c#L281 before the event is added to the list in g_source_add_poll call.

ilg-ul commented 5 years ago

this does not sound very encouraging :-(

zaq32 commented 5 years ago

Now that I knew what to look for I found similar change in the official qemu: https://github.com/qemu/qemu/commit/bf88c1247f80ac6d62710d5d0d0d9ce3a53e99ec#diff-d19e9e4265f0234bf61dec3e381c92a1R289

ilg-ul commented 5 years ago

sure, if you find a patch that fixes the problem, I'm ready to consider it, but I have limited testing capabilities on Windows.

zaq32 commented 5 years ago

Cherry picking that specific commit will not work as it is part of broader change and even by itself changes more than just event association for socket in channel-watch. Smaller patch would have to be devised. On the other it may be possible to incorporate this patch by updating the fork to newer version of official qemu. Is this feasible and if so are there any plans to do this?

ilg-ul commented 5 years ago

I plan some major changes to qemu, but, realistically, I cannot set a date, it might not be before the end of this year.

however, before starting work on the major changes, I might consider an intermediate step to update the current code to the latest upstream.

the changes should include a rework of the Cortex-M code, to no longer use the json files, but to generate C code only for the selected peripherals.

there are two main obstacles: first that I have to complete some more important parts of the project, and second that I have no-one to work with. if you can help me, this might boost the priorities.

zaq32 commented 5 years ago

This sounds like large task that can take a while to complete. For immediate future I have prepared minimal patch that fixed this particular problem basing on the changes from the official qemu repository: https://github.com/gnu-mcu-eclipse/qemu/pull/71

ilg-ul commented 5 years ago

it is a large task, and this is the reason it was constantly delayed.

ilg-ul commented 5 years ago

@zaq32, can you somehow make the new binary available to @Novakov, so he can use it until a new release will be available?

Novakov commented 5 years ago

Funny thing :) @zaq32 has his desk about 1.5 meter from me in the office, so I got patched version before pull request was made :)

Patched version works correctly, I can connect to gdbserver when QEmu is started in halt mode, load binary and debug it.

ilg-ul commented 5 years ago

Super! :-)

I'd be glad to work with you two to improve QEMU.

ilg-ul commented 5 years ago

Fixed in 2.8.0-7.