rust-minidump / minidump-writer

Rust rewrite of breakpad's minidump_writer
MIT License
68 stars 17 forks source link

Perform a checked add in the ptrace dumper ModuleMemory impl. #129

Closed afranchuk closed 2 months ago

afranchuk commented 3 months ago

Without this, integer overflow will cause a panic when we should gracefully fail. It's very possible the start address or offset could be incorrect/corrupted.

afranchuk commented 3 months ago

I'm augmenting this with a bunch of other arithmetic checks throughout the module_reader code.

mstange commented 3 months ago

Do you have the /proc/pid/maps from when you saw the incorrect / corrupted state? I wonder if it's not corruption but something else going on.

afranchuk commented 3 months ago

I do not have the inputs, though I would like to have them. It was on Android in CI testing: https://treeherder.mozilla.org/logviewer?job_id=469290206&repo=autoland&lineNumber=8170.

gabrielesvelto commented 3 months ago

This is bizarre and might be hiding a deeper problem. Is the issue reproducible? Can you check what values are involved and where did they come from?

afranchuk commented 3 months ago

I am going to try some runs with debug logging to try to track it down.

afranchuk commented 3 months ago

Regardless of the underlying issue, we are doing arithmetic with values derived from inputs, so I think we should not panic if overflow occurs.

gabrielesvelto commented 3 months ago

Regardless of the underlying issue, we are doing arithmetic with values derived from inputs, so I think we should not panic if overflow occurs.

Yes, I agree.

afranchuk commented 3 months ago

The problem is reproducible, now I'm trying to figure out how to exfiltrate some debug information.

gabrielesvelto commented 3 months ago

The problem is reproducible, now I'm trying to figure out how to exfiltrate some debug information.

eprintln!() FTW

afranchuk commented 3 months ago

eprintln!() did not work! After that I took a shotgun approach with eprintln!, println! and log::error! (on the off chance it could pick up the log crate registration in gkrust). Nothing showed up in logs :(

afranchuk commented 2 months ago

I've still been unable to figure out why this fails, however the fix here does remedy the error. While I'd like to know the root cause, it'd be nice to get this merged.

gabrielesvelto commented 2 months ago

I just re-checked the failed workload and I think there might be a way to surface the contents of the memory map. You could try using the android_log-sys or android-logger crate we have vendored to dump the contents of the memory map directly to the logcat. Our automation tasks capture the entire logcat output, so we could try that to exfiltrate the information we need.

afranchuk commented 2 months ago

I'm not sure whether android-logger will work, as I had assumed that when linking it links to the same log crate instance in gkrust (which presumably has a logger registered). However, it's entirely possible that's not happening, so I'm going to try it out.

gabrielesvelto commented 2 months ago

Worst case android_log-sys should allow you to invoke __android_log_print() directly, that will surely end up in the logcat output.

afranchuk commented 2 months ago

android-logger didn't work; its documentation says that if a logger is already installed it will silently do nothing, so I can only guess that's what's going on. I'll try directly printing.

afranchuk commented 2 months ago

Okay, so it looks like android-logger and __android_log_print() are both working, which means android-logger was working before and the taskcluster UI for searching the log is broken (I downloaded the log this time when I still didn't see the logs I expected).

afranchuk commented 2 months ago

The overflow occurs here:

start address=129746557718528, offset=18446738911158816776

This offset occurs when trying to get the build id of /system/lib64/libGLESv2.so from the notes section. Based on the addresses being read according to the log and the crash backtrace, I believe the strtab section offset is being read incorrectly, though I don't see any obvious reasons for this. However we can check this offset against the known size of the module. I'm going to try to get my hands on that file to confirm this.

afranchuk commented 2 months ago

The file works as expected when our code runs on it directly, so something odd is happening when it is loaded in memory. I noticed that the section header offset in the file is 0x16170, however according to the logs the offset during the failing test was 0x1616f, which I find a bit suspicious (though it's possible that's how it's loaded, but even the size in memory is the same as the file size). I can dump the loaded module to see what's up there. I also can't find that value in the file (if the section offset was being misread).

gabrielesvelto commented 2 months ago

Could you try checking if the issue presents itself with elfhack disabled?

afranchuk commented 2 months ago

I've failed to reproduce the test failure locally (which is fairly annoying, I'm not sure what's different), so I'm going to do a terrible thing to get the memory contents of that library on try.

afranchuk commented 2 months ago

It appears as if the section header table is not loaded (but a bunch of junk memory is there). This is not surprising, I've seen this before, though it's uncommon. I'll double-check the program headers to make sure this is expected. So in this case, there's two things we can do: verify the type of section headers we read/use (it's likely to be junk) before doing anything with them, and do checked arithmetic to avoid garbage values (which is already implemented).

afranchuk commented 2 months ago

The program headers confirm that the section header table wasn't expected to be loaded. What's disappointing is that the program header PT_NOTE segment doesn't contain the build id (which is what we're trying to get), only the section has it. The section is loaded into memory: it happens to be right after the PT_NOTE data.

mstange commented 2 months ago

Can you paste the program header table and the section table here? (I mean the output of readelf that displays those tables) Which program header is the section table in? How is it different in the normal case?

afranchuk commented 2 months ago

Can you paste the program header table and the section table here? (I mean the output of readelf that displays those tables) Which program header is the section table in? How is it different in the normal case?

There are 25 section headers, starting at offset 0x16170:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000238  00000238
       0000000000000015  0000000000000000   A       0     0     1
  [ 2] .note.androi[...] NOTE             000000000000024e  0000024e
       0000000000000018  0000000000000000   A       0     0     2
  [ 3] .note.gnu.bu[...] NOTE             0000000000000268  00000268
       0000000000000020  0000000000000000   A       0     0     4
  [ 4] .dynsym           DYNSYM           0000000000000288  00000288
       00000000000044d0  0000000000000018   A       5     1     8
  [ 5] .dynstr           STRTAB           0000000000004758  00004758
       0000000000003ccb  0000000000000000   A       0     0     1
  [ 6] .gnu.hash         GNU_HASH         0000000000008428  00008428
       0000000000001590  0000000000000000   A       4     0     8
  [ 7] .gnu.version      VERSYM           00000000000099b8  000099b8
       00000000000005bc  0000000000000002   A       4     0     2
  [ 8] .gnu.version_d    VERDEF           0000000000009f74  00009f74
       000000000000001c  0000000000000000   A       5     1     4
  [ 9] .gnu.version_r    VERNEED          0000000000009f90  00009f90
       0000000000000020  0000000000000000   A       5     1     4
  [10] .rela.dyn         RELA             0000000000009fb0  00009fb0
       0000000000000030  0000000000000018   A       4     0     8
  [11] .rela.plt         RELA             0000000000009fe0  00009fe0
       0000000000000090  0000000000000018  AI       4    12     8
  [12] .plt              PROGBITS         000000000000a070  0000a070
       0000000000000070  0000000000000010  AX       0     0     16
  [13] .text             PROGBITS         000000000000a0e0  0000a0e0
       0000000000005c45  0000000000000000  AX       0     0     16
  [14] .eh_frame         PROGBITS         000000000000fd28  0000fd28
       00000000000044f4  0000000000000000   A       0     0     8
  [15] .eh_frame_hdr     PROGBITS         000000000001421c  0001421c
       00000000000016d4  0000000000000000   A       0     0     4
  [16] .fini_array       FINI_ARRAY       0000000000016d60  00015d60
       0000000000000008  0000000000000000  WA       0     0     8
  [17] .dynamic          DYNAMIC          0000000000016d68  00015d68
       0000000000000250  0000000000000010  WA       5     0     8
  [18] .got              PROGBITS         0000000000016fb8  00015fb8
       0000000000000000  0000000000000000  WA       0     0     8
  [19] .got.plt          PROGBITS         0000000000016fb8  00015fb8
       0000000000000048  0000000000000000  WA       0     0     8
  [20] .data             PROGBITS         0000000000017000  00016000
       0000000000000008  0000000000000000  WA       0     0     8
  [21] .comment          PROGBITS         0000000000000000  00016008
       0000000000000026  0000000000000001  MS       0     0     1
  [22] .note.gnu.go[...] NOTE             0000000000000000  00016030
       000000000000001c  0000000000000000           0     0     4
  [23] .gnu_debuglink    PROGBITS         0000000000000000  0001604c
       0000000000000014  0000000000000000           0     0     1
  [24] .shstrtab         STRTAB           0000000000000000  00016060
       0000000000000109  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

Elf file type is DYN (Shared object file)
Entry point 0x0
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000001f8 0x00000000000001f8  R      0x8
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x0000000000000015 0x0000000000000015  R      0x1
      [Requesting program interpreter: /system/bin/linker64]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000158f0 0x00000000000158f0  R E    0x1000
  LOAD           0x0000000000015d60 0x0000000000016d60 0x0000000000016d60
                 0x00000000000002a8 0x00000000000002a8  RW     0x1000
  DYNAMIC        0x0000000000015d68 0x0000000000016d68 0x0000000000016d68
                 0x0000000000000250 0x0000000000000250  RW     0x8
  NOTE           0x000000000000024e 0x000000000000024e 0x000000000000024e
                 0x0000000000000018 0x0000000000000018  R      0x4
  GNU_EH_FRAME   0x000000000001421c 0x000000000001421c 0x000000000001421c
                 0x00000000000016d4 0x00000000000016d4  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x0
  GNU_RELRO      0x0000000000015d60 0x0000000000016d60 0x0000000000016d60
                 0x00000000000002a0 0x00000000000002a0  RW     0x8

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.android.ident .note.gnu.build-id .dynsym .dynstr .gnu.hash .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .plt .text .eh_frame .eh_frame_hdr 
   03     .fini_array .dynamic .got .got.plt .data 
   04     .dynamic 
   05     .note.android.ident 
   06     .eh_frame_hdr 
   07     
   08     .fini_array .dynamic .got .got.plt 

The section header table is not in a program header (which is the problem). The .note.gnu.build-id section is within the read/execute PT_LOAD header (header 2). Most binaries look like that, with an RX PT_LOAD from 0 to the end of the section contents which should be RX (they seem to be ordered intentionally to keep those sections all together).

It seems like this is the common case, I was wrong earlier to say that it's uncommon (though on linux I definitely recall having seen some binaries loaded wholly into memory, including the section header table). I think we normally succeed in getting the build id from the program headers, so we don't attempt the section headers (or we do but just get an access violation which we handle fine, rather than this overflow).

afranchuk commented 2 months ago

Maybe if we can't get the build id by reading the in-memory binary we could try to read it from the file (if present)? We changed to reading in-memory because in some cases the files are not present, but it's not an unreasonable fallback. It might incur a lot of extra IO though, I'm unsure how commonly we fail to find the build id. Note that we don't have this issue for SONAME because we can get everything we need from the PT_DYNAMIC segment.

The other no-good dirty-rotten hack we could do is just look ahead after the PT_NOTE segment(s) if we don't find the build id in the hopes that the note section is there (I'm willing to bet it's commonly found there).