google / bloaty

Bloaty: a size profiler for binaries
Apache License 2.0
4.79k stars 346 forks source link

bloaty shows 'region out-of-bounds' error on ELF files with a 0-sized segment at an offset greater than the ELF file size #378

Closed kjteske closed 1 month ago

kjteske commented 4 months ago

bloaty errors processing an ELF file where a 0-sized segment is not backed by the ELF file's contents:

$ bloaty <binary>
bloaty: region out-of-bounds  

This happens when bloaty is iterating segments and gets to the "TLS" segment in this particular file. readelf shows the .tbss section / TLS segment is at address 0x00690000, with MemSiz of 4 bytes but FileSiz is 0 bytes. With some irrelevant details redacted:

$ readelf -l -S <binary>
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
   ...
  [20] .tbss             NOBITS          00690000 680000 000004 00 WAT  0   0  4
   ...

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  EXIDX          0x648a04 0x00658a04 0x00658a04 0x36da8 0x36da8 R   0x4
  ...
  TLS            0x680000 0x00690000 0x00690000 0x00000 0x00004 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

 Section to Segment mapping:
  Segment Sections...
   00     .ARM.exidx 
   ...
   08     .tbss 
   09  

Since FileSiz is 0 bytes, it doesn't need to be backed by the file. And we see the file indeed is smaller than the 0x00690000 offset:

$ printf "0x%x\n" $(stat --format "%s" <binary>)
0x68aaf0

The crash is within ElfFile::ReadSegment() when header->p_filesz is 0. When p_filesz is 0, we should skip calling GetRegion().

haberman commented 3 months ago

Sorry for the slow reply.

I'm not quite understanding why this made Bloaty crash. Your report seems to be stating that p_offset is 0x00690000, but from my reading of your readelf output, p_offset=0x680000, which is less than your file size 0x68aaf0.

I guess your PR https://github.com/google/bloaty/pull/379 should be harmless: if p_filesz is 0, then the file region is empty anyway, and there's no benefit in crashing because the offset was bad. We could even consider just changing StrictSubstr() to immediately return an empty string_view if the size is zero.

kjteske commented 3 months ago

I'm not quite understanding why this made Bloaty crash.

Sorry - bloaty isn't actually crashing. But it bails out on the error and exits main() , printing just the error and nothing else useful.

Your report seems to be stating that p_offset is 0x00690000, but from my reading of your readelf output, p_offset=0x680000, which is less than your file size 0x68aaf0.

Oops - yes, I didn't report this correctly. It's not the .tbss section / TLS segment that triggers the error; it's the .fusion section in a LOAD segment where we get p_offset = 0x00690000.

$ readelf -l -S <binary>
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
   ...
  [33] .fusion           NOBITS          20000000 690000 8010000 00  WA  0   0  1
   ...

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  EXIDX          0x648a04 0x00658a04 0x00658a04 0x36da8 0x36da8 R   0x4
  ...
  LOAD           0x680000 0x00690000 0x00690000 0x0a3d4 0x18e8c RW  0x10000
  ...

 Section to Segment mapping:
  Segment Sections...
   00     .ARM.exidx 
   ...
   05     .fusion 
   09  

We could even consider just changing StrictSubstr() to immediately return an empty string_view if the size is zero.

Happy to implement this if you want. My thinking, as I'm not familiar with the code and all use cases and callers of StrictSubstr(), is that there could be some use cases where we do want StrictSubstr() to throw an error here. I was only confident enough in the ReadSegment() use case here to just fix that one spot.

haberman commented 3 months ago

Do you know what the .fusion section is? I'd be curious to know why its offset is out-of-bounds compared to the file.

kjteske commented 3 months ago

Realized I showed the wrong LOAD segment in my last comment. This is the .fusion segment, note FileSiz is zero, with a large MemSiz.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x690000 0x20000000 0x20000000 0x00000 0x8010000 RW  0x10000

.fusion is something we add via a linker script to reserve a range of memory in the virtual address space right away when the program is loaded, before anything else can run and potentially take up this address range. It's ugly but necessary for a 3rdParty library/driver we use.

kjteske commented 2 months ago

Ping on this. Thanks for your patience with my confusion about how this binary was created - this is clearly something fairly custom and not something most folks would run into. Nevertheless, it's a valid ELF file and something bloaty should be able to handle. I'm happy to add a simple test to https://github.com/google/bloaty/pull/379 if you're okay with the fix.

haberman commented 2 months ago

Sorry for the delay. I've been waffling on how we should address this problem.

My current thinking is to just change StrictSubstr() to use WARN() instead of THROW() when the region is out of bounds. Overall, Bloaty is an analysis tool, and there is little point in aborting entirely just because we ran into some unexpected data. But we do want to be able to see this information if we're debugging data quality issues, so the warning seems valuable.

How does that sound?

kjteske commented 2 months ago

Sounds good in theory - but turning StrictSubstr() to use WARN() instead of THROW() results in at least one fuzz test segfaulting:

7/7 Test #7: fuzz_test ........................***Exception: SegFault  0.07 sec

This is from fuzz_test tests/testdata/fuzz_corpus/1bfe776624349462cb1d818443af106215021470:

(gdb) bt
#0  Memcpy<Elf64_Ehdr> () at /home/kteske/dev/bloaty/src/elf.cc:214
#1  Read<Elf32_Shdr, Elf64_Shdr, bloaty::(anonymous namespace)::ShdrMunger> () at /home/kteske/dev/bloaty/src/elf.cc:194
#2  ReadStruct<Elf32_Shdr, Elf64_Shdr, bloaty::(anonymous namespace)::ShdrMunger> () at /home/kteske/dev/bloaty/src/elf.cc:171
#3  ReadSection () at /home/kteske/dev/bloaty/src/elf.cc:543
#4  0x00005555555eec2a in Initialize () at /home/kteske/dev/bloaty/src/elf.cc:513
#5  0x00005555555f382f in ElfFile () at /home/kteske/dev/bloaty/src/elf.cc:68
#6  bloaty::TryOpenELFFile () at /home/kteske/dev/bloaty/src/elf.cc:1418
#7  0x00005555555d1c2e in bloaty::Bloaty::GetObjectFile () at /home/kteske/dev/bloaty/src/bloaty.cc:1553
#8  0x00005555555d3339 in bloaty::Bloaty::AddFilename () at /home/kteske/dev/bloaty/src/bloaty.cc:1575
#9  0x00005555555d687d in bloaty::BloatyDoMain () at /home/kteske/dev/bloaty/src/bloaty.cc:2252
#10 0x00005555555d6a8d in bloaty::BloatyMain () at /home/kteske/dev/bloaty/src/bloaty.cc:2290
#11 0x00005555555cc84d in bloaty::RunBloaty () at /home/kteske/dev/bloaty/tests/fuzz_target.cc:57
#12 0x00005555555cca7c in LLVMFuzzerTestOneInput () at /home/kteske/dev/bloaty/tests/fuzz_target.cc:67
#13 0x00005555555cb616 in main () at /home/kteske/dev/bloaty/tests/fuzz_driver.cc:34

It looks like we're relying on StrictSubstr() to throw and hit the catch in BloatyMain(), rather than having all the range checking we'd need otherwise. I haven't looked much into how we could use WARN() and add other checks avoid this - I'm hesitant because this might be a bit of a rabbit hole or have other problems?

The solution that just changes ReadSegment() doesn't have this issue - I'd lean toward this more limited solution, but can look further down the rabbit hole if you prefer.

kjteske commented 2 months ago

Added a test to https://github.com/google/bloaty/pull/379 - confirmed this fails before the fix, passes after the current fix.

haberman commented 2 months ago

Thanks, the test is much appreciated.

Given that I've struggled to come up with a more general or principled fix, let's move forward with your targeted fix.

The fact that you added a test makes me more confident moving forward with the targeted fix, because if we try to do a more general fix later, we can be sure that we don't regress your use case.