Closed afranchuk closed 2 months ago
I'm augmenting this with a bunch of other arithmetic checks throughout the module_reader code.
Do you have the /proc/pid/maps from when you saw the incorrect / corrupted state? I wonder if it's not corruption but something else going on.
I do not have the inputs, though I would like to have them. It was on Android in CI testing: https://treeherder.mozilla.org/logviewer?job_id=469290206&repo=autoland&lineNumber=8170.
This is bizarre and might be hiding a deeper problem. Is the issue reproducible? Can you check what values are involved and where did they come from?
I am going to try some runs with debug logging to try to track it down.
Regardless of the underlying issue, we are doing arithmetic with values derived from inputs, so I think we should not panic if overflow occurs.
Regardless of the underlying issue, we are doing arithmetic with values derived from inputs, so I think we should not panic if overflow occurs.
Yes, I agree.
The problem is reproducible, now I'm trying to figure out how to exfiltrate some debug information.
The problem is reproducible, now I'm trying to figure out how to exfiltrate some debug information.
eprintln!()
FTW
eprintln!()
did not work! After that I took a shotgun approach with eprintln!
, println!
and log::error!
(on the off chance it could pick up the log
crate registration in gkrust
). Nothing showed up in logs :(
I've still been unable to figure out why this fails, however the fix here does remedy the error. While I'd like to know the root cause, it'd be nice to get this merged.
I just re-checked the failed workload and I think there might be a way to surface the contents of the memory map. You could try using the android_log-sys
or android-logger
crate we have vendored to dump the contents of the memory map directly to the logcat. Our automation tasks capture the entire logcat output, so we could try that to exfiltrate the information we need.
I'm not sure whether android-logger
will work, as I had assumed that when linking it links to the same log crate instance in gkrust (which presumably has a logger registered). However, it's entirely possible that's not happening, so I'm going to try it out.
Worst case android_log-sys
should allow you to invoke __android_log_print()
directly, that will surely end up in the logcat output.
android-logger
didn't work; its documentation says that if a logger is already installed it will silently do nothing, so I can only guess that's what's going on. I'll try directly printing.
Okay, so it looks like android-logger
and __android_log_print()
are both working, which means android-logger
was working before and the taskcluster UI for searching the log is broken (I downloaded the log this time when I still didn't see the logs I expected).
The overflow occurs here:
start address=129746557718528, offset=18446738911158816776
This offset occurs when trying to get the build id of /system/lib64/libGLESv2.so
from the notes section. Based on the addresses being read according to the log and the crash backtrace, I believe the strtab section offset is being read incorrectly, though I don't see any obvious reasons for this. However we can check this offset against the known size of the module. I'm going to try to get my hands on that file to confirm this.
The file works as expected when our code runs on it directly, so something odd is happening when it is loaded in memory. I noticed that the section header offset in the file is 0x16170
, however according to the logs the offset during the failing test was 0x1616f
, which I find a bit suspicious (though it's possible that's how it's loaded, but even the size in memory is the same as the file size). I can dump the loaded module to see what's up there. I also can't find that value in the file (if the section offset was being misread).
Could you try checking if the issue presents itself with elfhack disabled?
I've failed to reproduce the test failure locally (which is fairly annoying, I'm not sure what's different), so I'm going to do a terrible thing to get the memory contents of that library on try.
It appears as if the section header table is not loaded (but a bunch of junk memory is there). This is not surprising, I've seen this before, though it's uncommon. I'll double-check the program headers to make sure this is expected. So in this case, there's two things we can do: verify the type of section headers we read/use (it's likely to be junk) before doing anything with them, and do checked arithmetic to avoid garbage values (which is already implemented).
The program headers confirm that the section header table wasn't expected to be loaded. What's disappointing is that the program header PT_NOTE segment doesn't contain the build id (which is what we're trying to get), only the section has it. The section is loaded into memory: it happens to be right after the PT_NOTE data.
Can you paste the program header table and the section table here? (I mean the output of readelf that displays those tables) Which program header is the section table in? How is it different in the normal case?
Can you paste the program header table and the section table here? (I mean the output of readelf that displays those tables) Which program header is the section table in? How is it different in the normal case?
There are 25 section headers, starting at offset 0x16170:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
0000000000000015 0000000000000000 A 0 0 1
[ 2] .note.androi[...] NOTE 000000000000024e 0000024e
0000000000000018 0000000000000000 A 0 0 2
[ 3] .note.gnu.bu[...] NOTE 0000000000000268 00000268
0000000000000020 0000000000000000 A 0 0 4
[ 4] .dynsym DYNSYM 0000000000000288 00000288
00000000000044d0 0000000000000018 A 5 1 8
[ 5] .dynstr STRTAB 0000000000004758 00004758
0000000000003ccb 0000000000000000 A 0 0 1
[ 6] .gnu.hash GNU_HASH 0000000000008428 00008428
0000000000001590 0000000000000000 A 4 0 8
[ 7] .gnu.version VERSYM 00000000000099b8 000099b8
00000000000005bc 0000000000000002 A 4 0 2
[ 8] .gnu.version_d VERDEF 0000000000009f74 00009f74
000000000000001c 0000000000000000 A 5 1 4
[ 9] .gnu.version_r VERNEED 0000000000009f90 00009f90
0000000000000020 0000000000000000 A 5 1 4
[10] .rela.dyn RELA 0000000000009fb0 00009fb0
0000000000000030 0000000000000018 A 4 0 8
[11] .rela.plt RELA 0000000000009fe0 00009fe0
0000000000000090 0000000000000018 AI 4 12 8
[12] .plt PROGBITS 000000000000a070 0000a070
0000000000000070 0000000000000010 AX 0 0 16
[13] .text PROGBITS 000000000000a0e0 0000a0e0
0000000000005c45 0000000000000000 AX 0 0 16
[14] .eh_frame PROGBITS 000000000000fd28 0000fd28
00000000000044f4 0000000000000000 A 0 0 8
[15] .eh_frame_hdr PROGBITS 000000000001421c 0001421c
00000000000016d4 0000000000000000 A 0 0 4
[16] .fini_array FINI_ARRAY 0000000000016d60 00015d60
0000000000000008 0000000000000000 WA 0 0 8
[17] .dynamic DYNAMIC 0000000000016d68 00015d68
0000000000000250 0000000000000010 WA 5 0 8
[18] .got PROGBITS 0000000000016fb8 00015fb8
0000000000000000 0000000000000000 WA 0 0 8
[19] .got.plt PROGBITS 0000000000016fb8 00015fb8
0000000000000048 0000000000000000 WA 0 0 8
[20] .data PROGBITS 0000000000017000 00016000
0000000000000008 0000000000000000 WA 0 0 8
[21] .comment PROGBITS 0000000000000000 00016008
0000000000000026 0000000000000001 MS 0 0 1
[22] .note.gnu.go[...] NOTE 0000000000000000 00016030
000000000000001c 0000000000000000 0 0 4
[23] .gnu_debuglink PROGBITS 0000000000000000 0001604c
0000000000000014 0000000000000000 0 0 1
[24] .shstrtab STRTAB 0000000000000000 00016060
0000000000000109 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
Elf file type is DYN (Shared object file)
Entry point 0x0
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001f8 0x00000000000001f8 R 0x8
INTERP 0x0000000000000238 0x0000000000000238 0x0000000000000238
0x0000000000000015 0x0000000000000015 R 0x1
[Requesting program interpreter: /system/bin/linker64]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000158f0 0x00000000000158f0 R E 0x1000
LOAD 0x0000000000015d60 0x0000000000016d60 0x0000000000016d60
0x00000000000002a8 0x00000000000002a8 RW 0x1000
DYNAMIC 0x0000000000015d68 0x0000000000016d68 0x0000000000016d68
0x0000000000000250 0x0000000000000250 RW 0x8
NOTE 0x000000000000024e 0x000000000000024e 0x000000000000024e
0x0000000000000018 0x0000000000000018 R 0x4
GNU_EH_FRAME 0x000000000001421c 0x000000000001421c 0x000000000001421c
0x00000000000016d4 0x00000000000016d4 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x0
GNU_RELRO 0x0000000000015d60 0x0000000000016d60 0x0000000000016d60
0x00000000000002a0 0x00000000000002a0 RW 0x8
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.android.ident .note.gnu.build-id .dynsym .dynstr .gnu.hash .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .plt .text .eh_frame .eh_frame_hdr
03 .fini_array .dynamic .got .got.plt .data
04 .dynamic
05 .note.android.ident
06 .eh_frame_hdr
07
08 .fini_array .dynamic .got .got.plt
The section header table is not in a program header (which is the problem). The .note.gnu.build-id
section is within the read/execute PT_LOAD header (header 2). Most binaries look like that, with an RX PT_LOAD
from 0 to the end of the section contents which should be RX (they seem to be ordered intentionally to keep those sections all together).
It seems like this is the common case, I was wrong earlier to say that it's uncommon (though on linux I definitely recall having seen some binaries loaded wholly into memory, including the section header table). I think we normally succeed in getting the build id from the program headers, so we don't attempt the section headers (or we do but just get an access violation which we handle fine, rather than this overflow).
Maybe if we can't get the build id by reading the in-memory binary we could try to read it from the file (if present)? We changed to reading in-memory because in some cases the files are not present, but it's not an unreasonable fallback. It might incur a lot of extra IO though, I'm unsure how commonly we fail to find the build id. Note that we don't have this issue for SONAME because we can get everything we need from the PT_DYNAMIC segment.
The other no-good dirty-rotten hack we could do is just look ahead after the PT_NOTE segment(s) if we don't find the build id in the hopes that the note section is there (I'm willing to bet it's commonly found there).
Without this, integer overflow will cause a panic when we should gracefully fail. It's very possible the start address or offset could be incorrect/corrupted.