pine64 / bl602-re

Reverse engineering of BL602 blobs
Apache License 2.0
100 stars 26 forks source link

Ghidra #3

Open maidenone opened 4 years ago

maidenone commented 4 years ago

previous work on RV32 and RV64 https://delaat.net/rp/2019-2020/p49/report.pdf

https://reverseengineering.stackexchange.com/questions/22558/reversing-a-key-gen-firmware-for-risc-v

Ghidra release do not support Risc-V but if you install from source it does.

gamelaster commented 4 years ago

I already tried nightly Ghidra (there are some repos with prebuilt windows binaries), sadly, I wasn't able to configure Ghidra to correctly disassemble the blobs (yet)

micahswitzer commented 4 years ago

I was able to get a nightly build partially working. The batch import didn't work properly, and it also doesn't support a few kinds of relocations needed, but I was able to load a few object files manually.

There's a fork of Ghidra that has a few more updates to the RISCV module, but I haven't had the chance to check it out yet: https://github.com/mumbel/ghidra/tree/riscv

stschake commented 4 years ago

I've used the out-of-tree version here: https://github.com/mumbel/ghidra_riscv

Seems to work okay, even if it doesn't have the exact ISA. In general, I'd recommend just giving Ghidra an ELF from the build instead of trying to get it to digest the raw object files.

This is the built processor, extract to $GHIDRA_DIR -> Ghidra -> Processors: ghidra_9.1.2_PUBLIC_20201029_ghidra_riscv.zip

micahswitzer commented 4 years ago

Ah okay, I didn't realize that version existed. That's much cleaner.

I guess the reason I'd prefer reversing straight from the objects is so that we can more easily isolate the behavior of each API function. Maybe the goal is not so much to duplicate their API in C as it is to identify how to interact with the radios? I guess I'd like to see a clear goal established when it comes to the RE work.

gamelaster commented 4 years ago

@micahswitzer This is very good question. Although, I think the most easiest way of testing the setup if our RE'ed implementation works, is just reimplement their API.

maidenone commented 4 years ago

I got 3 BL602 boards on its way to me, i will make them remotely available if there are things we want to execute on real hardware. I also got a SDR that i can hook up to look at what happens in the spectrum when we poke at registers.

gamelaster commented 4 years ago

@maidenone well, how you want to deal with flashing? AFAIK, the flash tools are closed-sourced.

maidenone commented 4 years ago

have not thought about that. but given that it is a SiFive E25 core that uses JTAG, making OpenOCD talk with it should not be that hard?

I have poked around with OpenOCD and BMP code before to add new targets.

WildCryptoFox commented 4 years ago

My understanding of ghidra's decompiler is that it is written in C++ and doesn't depend on Java at all; but the linked repo appears to use Java..? I don't (yet) have Ghidra, nor Java, could someone present some decompilation samples over the objects in this repository?

Yesterday, I experimented with adding RISC-V support to r2dec. The output is a naive translation of assembly to a pseudo-c; arguably not much better than the assembly itself. The result could be greatly improved with post-processing but r2dec isn't really designed for data-flow transformations, so this would be limited to trivial cases.

This might be enough but I'd rather invest time in a new decompiler, which can use deep data-flow analysis to simplify the result. I'd start with RVSDG, an optimization-friendly data-flow intermediate representation in SSA-form without the total order of control-flow graphs. Such a decompiler could naturally be repurposed as an optimizing recompiler.

Would anyone be interested in either working on improving an existing decompiler or working towards a new one?

micahswitzer commented 4 years ago

Yes, the decompiler itself is written in C++. However, the processor specifications are written in a DSL called Pcode, and the code that tells Ghidra how to load platform specific relocations and DWARF information is written in Java.

So yes, you can run the decompiler without Java (I believe radare can do that), but it's much more useful if you use it within the context of Ghidra with all of the tools that Ghidra has to offer.

That being said, the RISCV module for Ghidra is not quite production ready. In my incredibly brief testing, I noted that most of the non-trivial relocation types were not implemented. I also read that there were some other issues with the pcode that caused Ghidra to misinterpret the meaning of the assembly (not the disassembly itself). I'm relatively new to the RISC-V ISA, but I'd be willing to see if I could at least implement the missing relocations which would greatly improve the usability of Ghidra for this project.

I'm also willing to lend another set of eyes on such an effort should someone else with more experience want to tackle this issue with another RE platform.

stschake commented 4 years ago

I might be misunderstanding, but since the ELF here is built purely so the code can be pulled out of it for a raw flash image, it won't have any relocations - there isn't any code in the ROM that could load them anyway. So while the Ghidra RISCV processor doesn't support a lot of them, that doesn't matter if you load the ELF into it?

micahswitzer commented 4 years ago

You are correct that there will be no relocations in the final binary loaded into ROM. However, since the library code references other internal functions and data structures, relocations are necessary to allow for flexibility during the final linking step.

I think you may be suggesting that we could simply compile and link a sample application which we could then RE since it would no longer have any relocations. If so, issue #6 suggests the same thing.

Yangff commented 4 years ago

Yes, the decompiler itself is written in C++. However, the processor specifications are written in a DSL called Pcode, and the code that tells Ghidra how to load platform specific relocations and DWARF information is written in Java.

So yes, you can run the decompiler without Java (I believe radare can do that), but it's much more useful if you use it within the context of Ghidra with all of the tools that Ghidra has to offer.

That being said, the RISCV module for Ghidra is not quite production ready. In my incredibly brief testing, I noted that most of the non-trivial relocation types were not implemented. I also read that there were some other issues with the pcode that caused Ghidra to misinterpret the meaning of the assembly (not the disassembly itself). I'm relatively new to the RISC-V ISA, but I'd be willing to see if I could at least implement the missing relocations which would greatly improve the usability of Ghidra for this project.

I'm also willing to lend another set of eyes on such an effort should someone else with more experience want to tackle this issue with another RE platform.

yes, plaease use the elf I pushed to the blobs. (or, if you want, compile them like @micahswitzer said ) They should also contain all the symbols and in my case, ghidra with risc-v plugin load them mostly good. Only problem is the floating point, which I think we can do it manually.

However, it seems that the decompile result contains some problem. as far as I can read, some memory r/w are missing.. i can see them in the assembly, but they disappear in decompile.

Snipaste_2020-10-30_22-38-34

WildCryptoFox commented 4 years ago

Reko, a capstone-based decompiler, might be a candidate. Unfortunately it doesn't seem to understand at least RISC-V ELF relocations. I don't know how much work is needed.

mumbel commented 4 years ago

if you come across any disassembly/instruction issues (I just fixed a bug in c.fsw and c.fswsp) or if you come across any ELF unimplemented relocations (I didn't notice any yet) feel free to file an issue on either of my repos. If its a bigger non-RISCV issue, i'd go ahead and file with ghidra's issues. There is an open bug being looked at if you come across a subtract popup.

edit: this was done in my side time, and I haven't used this module extensively so there are likely bugs, if anything looks off, comments are welcome, and hopefully 9.2 will be out soon so you don't have to build ghidra, but would for sure at least use my RISCV/data/languages/ which is compiled by ghidra at runtime based on if the timestamp for the .sla file is out of date/doesn't exist yet.

WildCryptoFox commented 4 years ago

@Yangff phy_init as derived using Reko. Do you see any issues with this result? (see also #14)

The if(false) looks suspicious. I wonder if this is assuming a static read from memory it doesn't understand is mmio? (cc @uxmal)

Yangff commented 4 years ago

@Yangff phy_init as derived using Reko. Do you see any issues with this result? (see also #14)

The if(false) looks suspicious. I wonder if this is assuming a static read from memory it doesn't understand is mmio? (cc @uxmal)

The first cond should be

  uVar1 = ((__DATA_44c00000 >> 8 & 0xf) - 1 & 0xff) << 4;
  if ((uVar1 & 0xffffff8f) != 0) {
    assert_err("(((uint32_t)rxnssmax << 4) & ~((uint32_t)0x00000070)) == 0","module",0xa09);
  }
  __DATA_44c00820 = uVar1 | __DATA_44c00820 & 0xffffff8f;

as decompiled by ghidra.

This is actually an inlined funciton, and should have a name like mdm_rxnssmax_setf.

void mdm_rxnssmax_setf(uint8_t rxnssmax) {
  assert_err((((uint32_t)rxnssmax << 4) & ~((uint32_t)0x00000070)) == 0);
  REG_PL_WR(0x44c00820,  ((uint32_t)rxnssmax << 4) | REG_PL_RD(0x44c00820) & ~((uint32_t)0x00000070) );
}

All other if (false) seems to have the same problme.

WildCryptoFox commented 4 years ago

@Yangff Could you upload ghidra's decompilation results for the 3 ELFs for all who don't have ghidra setup?

Yangff commented 4 years ago

ghidra's

yes, let me try.

Yangff commented 4 years ago

@Yangff Could you upload ghidra's decompilation results for the 3 ELFs for all who don't have ghidra setup?

added #15

stschake commented 4 years ago

@Yangff I think the problem with disappearing writes is that Ghidra doesn't know about the memory-mapped PHY stuff. I fixed that by adding it to the Memory Map with Start 0x44c00000, Size 0xd000 and marking it Read/Write+Volatile. Check mdm_reset and you should then see two distinct writes instead of the previous coalesced one (which wouldn't have worked to reset the thing).

I've also attached my notes on the various PHY registers there: phy.txt

micahswitzer commented 4 years ago

Yes, volatile memory regions are key for getting Ghidra to interpret mmio properly.

@stschake could you create a PR with that text file? I think it would be incredibly useful to keep a running list of registers and their functions as we continue to RE the blobs.

stschake commented 4 years ago

I've sent https://github.com/pine64/bl602-docs/pull/18

There is another mmio peripheral at 0x44b00000 (till ~0x44b09000) that has what the firmware calls mm or MAC management.

uxmal commented 4 years ago

The binaries @WildCryptoFox provided have exposed some bugs in Reko's Risc-V disassembler, specifically the decoding of Risc-V compressed instructions. I'm working on fixes and will have something by end of today.

micahswitzer commented 4 years ago

I've implemented the relocations necessary to load the raw libraries/objects into Ghidra. I'm not 100% sure it works correctly, but everything was looking nice in my testing. I've attached a build of my version of the extension here. If you find any issues with relocations specifically, you can open an issue on my fork here.

Now that I have them working, I can finally start doing some actual RE!

mumbel commented 4 years ago

@micahswitzer nice, that's a decent amount of relocations (not sure why I left all those TODO comments for the ones I implemented, maybe they were untested). Hadn't come across the need to handle unlinked ELFs until now, guessing most of those go away after linking, or did you see some unimplemented in the linked demos as well?

I'd fork NSA's ghidra repo and submit a PR for the new additions, probably wouldn't make it into 9.2 (which should be released soon reading their comments about it), but at least 9.2.1 hopefully. Not sure when you forked mine, but what is currently in my ghidra_riscv is what is in ghidra repo (just in tree).

mumbel commented 4 years ago

9.2 was released today, which includes RISC-V support

micahswitzer commented 4 years ago

@mumbel I just saw that. Great work on that feature, it will be very useful for this project!

I will probably spend some time this weekend cleaning up my code so that I can submit a PR as you suggested.

lorenz commented 4 years ago

FYI: https://github.com/pine64/bl602-docs/tree/main/hardware_notes#rf-ip

I found most of the code as source deep in various SDKs that people posted. As far as I can see most functions are in there and for all functions where I checked the behavior in the blob is the same as in the source.

rpavlik commented 3 years ago

So given the source discovery, I'm not sure if any decompiling is still needed (I want to contribute but I'm having a hard time figuring out what exactly the intermediate goals are), but FWIW, you can add the 3 ROM sections mentioned in the link above to the memory map in Ghidra, and you can load a slightly modified version of the SVD soc602_reg.svd.txt

with a slightly modified version of https://leveldown.de/blog/svd-loader/ (just comment out the sys.exit() call for non-cortex-m cpus), and it appears to work OK.

gamelaster commented 3 years ago

Hi @rpavlik , at the moment, there isn't any target, because we are waiting until Bouffalo officially, it should be in end of this month according this post. After that, we will decide and focus on spare blobs 😊

sajattack commented 3 years ago

I cracked this binary open in Ghidra as soon as I found out about it. Eager to contribute if I have time.