sm00th / bitlbee-discord

Bitlbee plugin for Discord (http://discordapp.com)
GNU General Public License v2.0
290 stars 27 forks source link

Fix use-after-free on OPCODE_RECONNECT/INVALID_SESSION #123

Closed Alcaro closed 6 years ago

Alcaro commented 6 years ago

I got a few segfaults during a recent Discord outage. Throwing Valgrind at it yielded:

[23:25:36] <<< ((null)) discord_parse_message 36
{"t":null,"s":null,"op":9,"d":false}

==20729== Invalid read of size 8
==20729==    at 0x8683E84: discord_ws_in_cb (discord-websockets.c:247)
==20729==    by 0x130FD0: ??? (in /usr/sbin/bitlbee)
==20729==    by 0x566359F: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5.1.9)
==20729==    by 0x130B4F: b_main_run (in /usr/sbin/bitlbee)
==20729==    by 0x11AD3C: main (in /usr/sbin/bitlbee)
==20729==  Address 0x80ff9e8 is 88 bytes inside a block of size 136 free'd
==20729==    at 0x4C2CDDB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20729==    by 0x867F2BC: discord_logout (discord.c:139)
==20729==    by 0x13D6D1: imc_logout (in /usr/sbin/bitlbee)
==20729==    by 0x8681191: discord_parse_message (discord-handlers.c:788)
==20729==    by 0x8683F38: discord_ws_in_cb (discord-websockets.c:243)
==20729==    by 0x130FD0: ??? (in /usr/sbin/bitlbee)
==20729==    by 0x566359F: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5.1.9)
==20729==    by 0x130B4F: b_main_run (in /usr/sbin/bitlbee)
==20729==    by 0x11AD3C: main (in /usr/sbin/bitlbee)
==20729==  Block was alloc'd at
==20729==    at 0x4C2DBC5: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20729==    by 0x538FE60: g_malloc0 (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.5000.3)
==20729==    by 0x867F2EE: discord_login (discord.c:118)
==20729==    by 0x13AFFB: account_on (in /usr/sbin/bitlbee)
==20729==    by 0x12C70C: ??? (in /usr/sbin/bitlbee)
==20729==    by 0x12C929: cmd_identify_finish (in /usr/sbin/bitlbee)
==20729==    by 0x12CC1F: ??? (in /usr/sbin/bitlbee)
==20729==    by 0x11FFFE: irc_process (in /usr/sbin/bitlbee)
==20729==    by 0x11BBBE: bitlbee_io_current_client_read (in /usr/sbin/bitlbee)
==20729==    by 0x130FD0: ??? (in /usr/sbin/bitlbee)
==20729==    by 0x566359F: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5.1.9)
==20729==    by 0x130B4F: b_main_run (in /usr/sbin/bitlbee)

I believe this PR fixes it. Unfortunately, the downtime ended before I was able to figure it out, so I wasn't able to fully test it.

sm00th commented 6 years ago

Hi there, thanks for the report. I don't think this solution is the right onle, it might remove the segfault, but it will just hide the problem. The thing is - discord_ws_in_cb() should never be called after we logged out and that is what I was trying to achieve by removing dd->inpa event in discord_ws_cleanup(). Looking at the calltrace I see that the event did get removed, but for some reason there was another one that wasn't cleaned up properly for some reason. The more or less correct workaround would be checking/cleaning up any lingering dd->inpa in discord_ws_connected_cb(), but again that would just hide the problem, and if dd->inpa is not properly cleaned up there might be other issues that are not so obvious. I think we need to figure out the root cause so maybe we can try to emulate the conditions that lead to this. Can you recall the outage? Was there multiple disconnects prior the segfault or was it the first disconnect that triggers it? What was the stated cause of disconnect(if any), it looks like the last one was "Invalid session" were there any before that?

Alcaro commented 6 years ago

No, there is no other event that's not removed. The issue happens if discord_parse_message() calls iwc_logout(); if that happens, discord_ws_in_cb() doesn't return immediately, but instead touches ->ssl on a dead object.

That's why Valgrind points to line 247, if (ssl_pending(dd->ssl)) {, not discord_data *dd = ic->proto_data; or if (dd->state == WS_CONNECTING) { at the start of the function.

All disconnections were invalid session or expired token; I didn't get it to segfault on the first disconnect, but use-after-free tends to work sometimes. However, unless my memory fails, Valgrind got angry at the first one.

And I found a way to reliably trigger the use-after-free:

(discord-handlers.c, around line 783)
  } else if (op == OPCODE_HEARTBEAT_ACK) {
    // heartbeat ack
-  } else if (op == OPCODE_RECONNECT) {
+  } else if (op == OPCODE_RECONNECT || strstr(buf, "bitlbeetestforgreatglory")) {
    imcb_log(ic, "Reconnect requested");
    imc_logout(ic, TRUE);

then say the magic string in some random channel, and the gateway will send it back. Doing this reliably yields, on the first offense:

[12:38:49] >>> (Alcaro) discord_http_send_msg 312
About to send HTTP request:
POST /api/channels/336305131689607168/messages HTTP/1.1
Host: discordapp.com
User-Agent: Bitlbee-Discord
authorization: hunter2
Content-Type: application/json
Content-Length: 74

{"content":"bitlbeetestforgreatglory", "nonce":"+shUFFc6uvjxbF9zkyz5jQ=="}
[12:38:49] <<< (Alcaro) discord_parse_message 492
{"t":"MESSAGE_CREATE","s":16,"op":0,"d":{"type":0,"tts":false,"timestamp":"2017-12-31T11:38:49.508000+00:00","pinned":false,"nonce":"+shUFFc6uvjxbF9zkyz5jQ==","mentions":[],"mention_roles":[],"mention_everyone":false,"id":"396990577935122432","embeds":[],"edited_timestamp":null,"content":"bitlbeetestforgreatglory","channel_id":"336305131689607168","author":{"username":"Alcaro","id":"188322172576464896","discriminator":"3261","avatar":"e3466274aa00038c9b27e3cb0b95dae5"},"attachments":[]}}

==30571== Invalid read of size 8
==30571==    at 0x8683F44: discord_ws_in_cb (discord-websockets.c:247)
==30571==    by 0x130FD0: ??? (in /usr/sbin/bitlbee)
==30571==    by 0x566359F: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5.1.9)
==30571==    by 0x130B4F: b_main_run (in /usr/sbin/bitlbee)
==30571==    by 0x11AD3C: main (in /usr/sbin/bitlbee)
==30571==  Address 0x80ff9e8 is 88 bytes inside a block of size 144 free'd
==30571==    at 0x4C2CDDB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30571==    by 0x867F68C: discord_logout (discord.c:139)
==30571==    by 0x13D6D1: imc_logout (in /usr/sbin/bitlbee)
==30571==    by 0x868118F: discord_parse_message (discord-handlers.c:785)
==30571==    by 0x8683FF8: discord_ws_in_cb (discord-websockets.c:243)
==30571==    by 0x130FD0: ??? (in /usr/sbin/bitlbee)
==30571==    by 0x566359F: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5.1.9)
==30571==    by 0x130B4F: b_main_run (in /usr/sbin/bitlbee)
==30571==    by 0x11AD3C: main (in /usr/sbin/bitlbee)
==30571==  Block was alloc'd at
==30571==    at 0x4C2DBC5: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30571==    by 0x538FE60: g_malloc0 (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.5000.3)
==30571==    by 0x867F6BE: discord_login (discord.c:118)
==30571==    by 0x13AFFB: account_on (in /usr/sbin/bitlbee)
==30571==    by 0x12C70C: ??? (in /usr/sbin/bitlbee)
==30571==    by 0x12C929: cmd_identify_finish (in /usr/sbin/bitlbee)
==30571==    by 0x12CC1F: ??? (in /usr/sbin/bitlbee)
==30571==    by 0x11FFFE: irc_process (in /usr/sbin/bitlbee)
==30571==    by 0x11BBBE: bitlbee_io_current_client_read (in /usr/sbin/bitlbee)
==30571==    by 0x130FD0: ??? (in /usr/sbin/bitlbee)
==30571==    by 0x566359F: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5.1.9)
==30571==    by 0x130B4F: b_main_run (in /usr/sbin/bitlbee)
==30571== 
(I terminated Valgrind here, otherwise the gateway login messages flush everything useful out of my terminal backlog.)

With this PR (and the same test patch), Valgrind is happy.

Alcaro commented 6 years ago

I felt out parameter makes more sense, but if you prefer return, then return it is.

sm00th commented 6 years ago

Great, thank you!