Notes on reverse-engineering progress for main MCU

jackhumbert / pinebook-pro-keyboard-updater

A keyboard/touchpad firmware and updater for the Pinebook Pro

78 stars 11 forks source link

Notes on reverse-engineering progress for main MCU #23

Open Stevie-O opened 4 years ago

Stevie-O commented 4 years ago

@jackhumbert So I'd like to help out with the reverse-engineering effort for the keyboard firmware.

My first question is: What tool and process was used to produce main.c ? I see a bunch of problems with it that don't make any sense.

It's not actually valid C code
- stray a3FC0 = on line 20
- missing statement (possibly just a bare ; or {}) on line 30
- missing semicolon on line 44
- line 32 tries to take the addresses of constants
- line 83 tries to assign a value to a constant (#define EA 0x80 in header file)
- in fact, I don't even know what's going on with L0090, which is code outside of a function
L0134 is annotated "startup function of some sort" but there's nothing anywhere to call it
Stuff that's actually important (timing nops, indirect jumps) are simply expressed as comments

Without the original diassembly, I'm concerned that some of the C pseudocode is incorrect as written.

jackhumbert commented 4 years ago

main.c is just pseudo code translated/written by hand from the .a51 files, in an attempt to understand what's going on in the assembly. The .c extension is probably a misnomer, but I wanted C highlighting syntax when writing it. Some parts resemble actual C code because it was easiest to express ideas that way. I don't think it's worth it to try to convert main.c into something that can compile.

Stevie-O commented 4 years ago

Okay, so I want to be looking at the .a51 files. What was used to produce these?

I noticed there's a lot of differences between the fw_iso.a51 and fw_ansi.a51 files, and not just the hand-added comments. Starting right at address 3, fw_iso.a51 has DBs while fw_ansi.a51 has disassembled code. That would suggest that you'd started with ISO file and then refined the process when dealing with the ANSI version; however, addresses 3DBE and 3DC0 are commented differently in both files, so I'm not sure what's going on there.

I'm also noticing some strange one-byte gaps between the routines (0x0012, 0x001A, 0x0022 as examples.) Any idea what's going on there?

jackhumbert commented 4 years ago

This is the command with the addresses need to decode the entire file. The two files are pretty much the same when it comes to the actual assembly - diff'ing the files isn't really useful since they are offset by a couple bytes. The custom keymap stuff I wrote is based on the ansi one, but with the modifications it can easily work on iso, so I don't think it's worth looking at the iso one at all, since so much work has already gone into decoding ansi.

Do you have a goal with what you'd like to do? I'm happy to answer questions, but I've spent a couple dozen hours disassembling and documenting things in main.c, so it may be helpful to work through that and understanding the 8051 assembly instructions used in the fw_ansi.a51. You can use the labels to search for the relevant parts.

Stevie-O commented 4 years ago

My concrete goals are:

Understand how the current firmware does what it does
Understand how closely the current firmware approaches the limits of the SH68F83 (and, thus, learn how much more could be done with the chip)
Create a fully FOSS replacement for the firmware, thus moving that much closer to a blob-free future

These are some secondary goals I'd like to investigate:

A better way to handle switching keymaps. It's very likely possible to isolate the key-mapping tables from the core firmware; doing so might make it easier to enhance the core firmware whilst permitting customization of the keymap, without doing any nasty patching of .hex files. This would depend on how efficiently flash space is being used; on the surface looks like almost all of the 16KB of flash storage is being used. For example, fw_ansi.a15 has nonzero data all the way at address 3FBE. However, summing up the "byte count" values from all records in the .hex file:
```
perl -lne '$size += hex ((unpack("xA2", $_))[0]); END { print $size }' fw_ansi.hex
```
outputs 12279, which indicates that there's probably a lot of space that's not being used efficiently.
More efficient/flexible keymap definition structure
Presenting ourselves as a composite device: the keyboard itself, and a secondary system support device that can provide information about kill switch status (as real-time "push" notifications, or as on-demand inquiries) This would go a long way towards improving the Pinebook user experience.

akirakyle commented 4 years ago

I've spent awhile staring at the disassembly of the keyboard firmware and documented what I've found here: https://github.com/akirakyle/pinebook-pro-keyboard-updater/tree/master/firmware/disassembly. I lost some steam when I realized there likely wouldn't be a safe way to flash the SH68F83 using the SSP flashing method that the firmware updater uses. Basically if you mess up any usb logic in your firmware you'll likely brick the SH68F83 as you'd be unable to force it into the SSP 'bootloader' flashing program starting at address 0x3800. I've had some rough ideas to make it safer but I'm hesitant to go further and actually try modifying the firmware beyond simple overlays as this pbp is my daily driver. What we really need for reverse-engineering to be truly successful is JTAG but unfortunately it seems thats sinowealth proprietary IP.

Stevie-O commented 4 years ago

@akirakyle Hah, I saw AKIR and thought of the crazy anime movie with the psychic kid.

Yesterday I spent a few hours analyzing the assembly code in fw_ansi.a51 and annotating it with lots of comments.

Take a look here:

https://github.com/Stevie-O/pinebook-pro-keyboard-updater/blob/reveng-notes/firmware/fw_ansi.a51#L7455

Also, you don't need full JTAG. The device supports ICP (in-circuit programming), and there's a diagram in section 11.2 (page 46) of the SH68F83 datasheet explaining how it works. In theory, all of the necessary data lines are accessible via J6, but I'm not sure how to force the POR required to put it into the right state.

akirakyle commented 4 years ago

Sorry I didn't reply sooner, I might've made it a bit less painful to look over the assembly. I'd suggest you take a look at the disassembles in the link I posted under *.mcs51. I think they're a lot more helpful as they actually use the SH68F83.h sfr defs. Also take a look at ssp.c which has been my attempt at starting to translate the SSP logic into c and I found (and I see you probably found too in looking through your link) that there's some pretty tricky conditional flow going on.

I'd be delighted if I'm wrong about about the danger of bricking using the ICP (SSP) mode for flashing. I'm not saying it can't be done but as I wrote in the section "SSP over USB" in the linked README.org: "So far I’ve found that the updater utility somehow causes the SH68F83 to enter this SSP “bootloader” portion of the firmware above 0x3800 by doing a usb control transfer REQUEST_SET_CONFIGURATION for a configuration 0x0305 which isn’t part of the Device’s Descriptor. The device is then opened with a different Vendor ID and Product ID which corresponds to the device descriptor starting at 0x3f2c which is in this “bootloader” portion of the firmware."

It's worth taking a look at the updater.c utility as that's how we know from the host side to get the fw into the "right state" for flashing. I actually started writing it in python as I think the existing tool in c written by ayufan is too verbose to easily understand the host side flashing logic (see updater.py in my fork). It's worth noting that the flasher overwrites the first three bytes of the fw with whatever three bytes are at 0x37fb so 0x3800 isn't actually the entry point. As far as I can tell the SSP block above 0x3800 is only entered on a Timer1 interrupt which jumps to 0x3F00 and I haven't figured out what happens there. If this is right it means for flashing to be "safe" we need some mechanism to guarantee that the Timer1 interrupt can be reached from any firmware state (or rather some code to guarantee that we can create a condition on the host side after a complete reset that will always trigger a jump into the SSP section).

PS: it actually does check for the full string "AKIRA" in memory, the last condition to match the final 'A' is checking if r24 == r20 where it already checked that *r20 == 'A'

Stevie-O commented 4 years ago

@akirakyle To be fair, I'm not sure how much I'd have trusted your C translations anyway :) The code is extremely convoluted in many places, and it'd be easy to get it wrong.

Sorry I didn't reply sooner, I might've made it a bit less painful to look over the assembly. I'd suggest you take a look at the disassembles in the link I posted under *.mcs51. I think they're a lot more helpful as they actually use the SH68F83.h sfr defs.

Eh, that's no big deal. I already did those fixups on my own to the fw_ansi.a51 file (if you poke around the repo you'll find the script I used to translate the SFR addresses to names.)

I've avoided looking at the utility side of things, to avoid contaminating myself with possibly-incorrect assumptions and incomplete guesses others have made about the bootloader code.

It's worth noting that the flasher overwrites the first three bytes of the fw with whatever three bytes are at 0x37fb so 0x3800 isn't actually the entry point. As far as I can tell the SSP block above 0x3800 is only entered on a Timer1 interrupt which jumps to 0x3F00

By "flasher", do you mean the utility that pushes firmware updates into the MCU?

I've been working from the assumption that the contents of fw_ansi.a51 correspond to what's actually inside the MCU. If that's not correct, then I need to know what's different. Assuming it is correct, I believe you have misunderstood the nature of 0x37FB.

Here is the beginning of the core firmware flash subroutine:

https://github.com/Stevie-O/pinebook-pro-keyboard-updater/blob/0e37cdfaf7f96878c8cfc79645bfa530b0df8227/firmware/fw_ansi.a51#L8301

This routine is used to (a) erase a single 1024-byte sector, or (b) program a single byte.

Now look at this code:

https://github.com/Stevie-O/pinebook-pro-keyboard-updater/blob/0e37cdfaf7f96878c8cfc79645bfa530b0df8227/firmware/fw_ansi.a51#L8327-L8348

As the comment states, this code is run in case (b) when the destination address is less than or equal to 0x00FF.

Instruction 3D64 is an oddity. I finally figured out what it's for; I'll need to update my notes. 3D64 basically says "set C=1 if and only if A is less than 3". Then, 3D67 does a JNC -- jump if A is greater than or equal to 3 -- to L0040, which actually performs the write operation.

If A is 0, 1, or 2 -- that is, if we're writing to 0x0000-0x0002 -- then the code verifies that the bytes being written to that address are 02 38 00. If they are not, the write request is ignored and nothing is written to flash.

37FB

37FB seems to be handled this way: (I'll try to remember to add my annotations later)

https://github.com/Stevie-O/pinebook-pro-keyboard-updater/blob/0e37cdfaf7f96878c8cfc79645bfa530b0df8227/firmware/fw_ansi.a51#L7364-L7373

Read byte at 0x37FB 2, If it's not 0x02 (the opcode for LJMP) then go to L0112, which dumps us back into the bootloader exactly as if the "AKIRA" magic signature had been there Otherwise:
clear B, DPTR, and PSW, and initialize stack pointer to 0x07 (so first stack space written is 0x08)
Jump to 0x37FB.

This is a pretty common pattern; the Maxstream XBee radios do the same thing. It solves two problems fairly elegantly:

A. The bootloader doesn't allow itself to be overwritten; as such, it can't adapt to changes in the entry point for the main application.

B. If something goes wrong midway through a firmware update, you've got only half the firmware on your device (or, possibly worse, half new firmware and half old firmware. At least with a half-blank chip, it's obvious that the culprit was a bad flash job.)

So this is what they do:

Require a thunk at 0x37FB that jumps to the real entry point.
The first step in any reflash is to erase sector 0x3400 (0x3400-0x37FF)
Then all sectors but sector 0 are erased
Then sector 0 is erased
Then flash the first 3-8 bytes of the new firmware
Then flash the rest of it.

Once step 1 completes, 0x37FB reads as FF FF FF (or possibly 00 00 00, but most flash erases to all 1s.) This means we're still safe (theoretically): if the MCU is rebooted for any reason, the conditional jump at 0x383D will activate and reenter bootloader mode.

In fact, we continue to be safe until we begin step 3, where we erase the code at the reset vector. If the CPU is reset at this point, the chip is bricked (without some soldering.) The danger zone persists until step 4 completes; after that point, the reset vector is done properly again.

Then we're safe until 0x37FB is written with the value 0x02. At that point, we're in the second danger zone: if the device resets before 0x37FC and 0x37FD are written, it'll jump off to nowhere. (This particular hole can be avoided by writing 0x37FC-0x37FD first, and only afterwards writing 0x37FB.)

akirakyle commented 4 years ago

@akirakyle To be fair, I'm not sure how much I'd have trusted your C translations anyway :) The code is extremely convoluted in many places, and it'd be easy to get it wrong.

True, I'm not sure how much to trust my own translations anyways :) After all this is my first time poking around 8051 assembly. It sounds like you have a lot more experience with this and I'm happy someone else is interested in this reverse engineering process since honestly I'm probably not the right person for this job, but I have learned a lot so far.

Eh, that's no big deal. I already did those fixups on my own to the fw_ansi.a51 file (if you poke around the repo you'll find the script I used to translate the SFR addresses to names.)

Even so, I'd suggest if you aren't already familiar with it, checking out the mcs51-disasm.pl script from sdcc. I think it produces much more readable disassembles and I see you're already familiar with pearl so it should be easier for you to hack on it.

I've avoided looking at the utility side of things, to avoid contaminating myself with possibly-incorrect assumptions and incomplete guesses others have made about the bootloader code.

AFAIK the flashing utility was written by Ayufan as rewrite of the sources of the Qt based windows only flashing tool that pine64 convinced Sino wealth to release to them. If you look through the git history you'll find those sources in the initial commit along with some earlier revisions of the firmware that had some bugs regarding touchpad input not being passed through while typing (which I think motivated this whole user flashing the firmware in the first place so I guess I'm thankful for those bugs as now we actually have the firmware for the keyboard controller and touchpad controller, however no one that I know of has even successfully disassembled that).

By "flasher", do you mean the utility that pushes firmware updates into the MCU?

Yes the usb host side of the firmware update process.

I've been working from the assumption that the contents of fw_ansi.a51 correspond to what's actually inside the MCU. If that's not correct, then I need to know what's different. Assuming it is correct, I believe you have misunderstood the nature of 0x37FB.

Given what you've found regarding what happens to 0x37FB and the system init jump at 0x0000 during the SSP process, I'm starting to have my doubts about what I earlier thought about this. I hadn't gotten around to deeply inspecting the SSP code starting at 0x3D3C other verifying it goes through the SSP process described in the SH68F83 documentation. The steps the flasher performs to write the firmware using usb control transfers looks like this (where a write is host to controller and read is controller to host)

Read in the hex and apply (where ih is the byte array holding the read in hex file):

if (ih[1] == 0x38 and ih[2] == 0x00):
    print(">>> Fixing hex file")
    ih[0] = ih[0x37FB]
    ih[1] = ih[0x37FC]
    ih[2] = ih[0x37FD]
    ih[0x37FB] = 0x00;
    ih[0x37FC] = 0x00
    ih[0x37FD] = 0x00

Issue a write to switch the controller to "boot mode" which causes the controller to present as a new usb device with the device descriptor starting at 0x3F2C
Write which presumably causes the necessary parts of the flash to be erased.
Write some command that contains the length of the firmware that will be sent presumably making the controller ready to start flashing
Write the first 2 kB of ih but with ih[0] = 0
Write the remaining 2 kB blocks of ih
Write the first 2 kB of ih
Repeatedly read blocks of 2 kB. This is compared to what was flashed up to 0x37FB and passes if it is identical.

Given that you've identified logic that preserves the system init jump being to (presumably) 0x3800 during flashing, I wonder why the flashing tool moves the 0x37FB jump to 0x000 then zeros out that address? For me to be convinced the bootloader pattern you described is what is actually happening I'd want to find the logic that makes it mesh with what we know the host side flashing tool does to successfully update the firmware. Namely that means finding the logic that handles:

Writing the real entry point to 0x37fb given that the 2 kB block intended for 0x3000-0x3800 has the jump at 0x37fb zeroed out by the host. Due to the "fixup" of the hex file, the real jump location is sent as the second and third byte of the first and last 2kB block sent by the host.
Sending the real entry point back to the host in the second and third bytes when the host requests the firmware to be read back.

I really hope you're right and I'm inclined to believe that's whats happening if you say that this is a common pattern seen in similar firmware. I suppose in not knowing about this kind of bootloader pattern, I started convincing myself that the host was overriding the system init jump location during flashing because the system init jump was only meant for development and debugging purposes. So in my mind it followed that somehow logic below 0x3800 was responsible for jumping into the bootloader above 0x3800, thus trying to touch any of that logic would be very easily catastrophic. I'm glad to know there is a common pattern in writing firmware that minimizes this "risky surface" when there are so many ways flashing could wrong and lead you with a bricked chip.

Also I think maybe we can take some comfort in the fact that the JTAG programmer for this chip can be bought here (at least I'm fairly certain this is the right device, although it would help if I understood the Chinese text)

Stevie-O commented 4 years ago

@akirakyle HAH! I think I've found it! (Not 100% sure, read further)

(unrelated note: check out the code at 0x3C7B)

L0556:
  3DAE 7D00         MOV R5, #0h
  3DB0 7C01         MOV R4, #1h
  3DB2 7438         MOV A, #38h     ; write 0x38 to 0x0001
  3DB4 B1BE         ACALL L0562
  3DB6 7400         MOV A, #0h      ; write 0x00 to 0x0000
  3DB8 B1BE         ACALL L0562
  3DBA 7C00         MOV R4, #0h     ; write 0x02 to 0x0000 - LJMP 0x3800
  3DBC 7402         MOV A, #2h

This writes 023800 - the LJMP 0x3800 instruction - to address zero.

Considering the code you posted:

        ih[0x37FB] = 0x00;
        ih[0x37FC] = 0x00
        ih[0x37FD] = 0x00

Based on the logic I found in the firmware, this patch might not actually be necessary; it looks like the code is prevented from being written to 0x37FB and above:

L0545:
  3DF8 B43800       CJNE A, #38h, L0546
L0546:
  3DFB 501C         JNC L0539
  3DFD B43708       CJNE A, #37h, L0532
  3E00 BCFA00       CJNE R4, #0FAh, L0547
L0547:
  3E03 4003         JC L0532
  3E05 BCFA11       CJNE R4, #0FAh, L0539

3DF8-3DFB prevent writes to >= 0x3800. 3E00-3E05 prevent writes to >= 0x37FB (code is checking for > 0x37FA) I'm thinking the second one was bolted on later, since I'm pretty sure it makes the first check redundant. (I'm also thinking that this might have originally been C code, because by changing 3E00 to check for #0FBh, 3E03 could be changed to JNC L0539 and 3E05 is unneeded.)

I haven't spotted it yet, but I'm betting I'll find code that diverts requests to write to address 0x0000 coming from the updater over to address 0x37FB.

EDIT: I posted too soon. Here it is!

L0541:
  3DDA ED           MOV A, R5               ; R5 = destination address high byte
  3DDB 700F         JNZ L0543
  3DDD EC           MOV A, R4               ; R4 = destination address low byte
  3DDE B40300       CJNE A, #3h, L0544
L0544:
  3DE1 5025         JNC L0532
  3DE3 75F737       MOV 0F7h, #37h           ; 0F7h = XPAGE
  3DE6 24FB         ADD A, #0FBh
  3DE8 F5BE         MOV 0BEh, A              ; 0BEh = IB_OFFSET
  3DEA C112         AJMP L0542

This does exactly what I said above: if the destination address is 0x0000-0x0002, it remaps it to 0x37FB.

I haven't worked out where these routines are hooked up in the overall flow, but their very presence is telling.

(also: I've never actually worked with 8051s before. But I worked on a lot of embedded microcontrollers, including a PIC16 that powered a low-speed USB device, and the SH2 for the XBee radios.)

Stevie-O commented 4 years ago

@jackhumbert @akirakyle Okay, so if my understanding so far is correct, I have an explanation for this weirdness:

  // HACK: overwrite first byte (as in original sources)
  unsigned char first_byte = data[0];
  data[0] = 0;

  <write everything>

  data[0] = first_byte;

  <do another write>

If we assume that the "erased" state of the MCU's flash is 0x00 (not 0xFF, which is much more common), then "writing" 0x00 to a memory location is a no-op. Thus, patching the first byte with 0x00 for the initial write prevents 0x37FB from being written to.

Address 0x37FB serves two purposes:

It is the entry vector to the main firmware. After the bootloader does its thing, it does an LJMP to 0x37FB.
It serves as an indicator as to whether or not there is firmware. At startup, the bootloader checks for 0x02 (opcode for LJMP); if it is not, the bootloader instead enters the firmware-loading state.

When the 'erase flash' routine at 3F1B (L0554) runs, the first thing it does is erase sector 0x3C00 (which contains 0x3C00-0x37FF). This will reset 0x37FB to 0x00, causing the bootloader's "do we have firmware" check to return FALSE.

However, the USB starts writing at 0x0000 (which the firmware remaps to 0x37FB, see 3DDA/L0541). That means the first thing we would naturally do is program 0x37FB with 0x02, making the bootloader think we have firmware, when we don't.

By patching the buffer in-memory so we write 0x00 here, we basically prevent a write from being done at all.

Then, after we've written all of the firmware (except that one byte), we do one last write, writing the real value (0x02) to 0x0000/0x37FB, marking the firmware as completed internally.

Assuming this is all correct, it leaves some interesting consequences:

Writes to 0x37FB and above are dropped by the bootloader firmware and do nothing but slow down the process unnecessarily.
You should be able to start the write at 0x0001 instead of 0x0000, and avoid having to patch the buffer
You only have to write 1 byte at the end, not an entire 2048 (you might need to write at least 8 bytes due to certain considerations; you might want to write at least 3 bytes due to other considerations.)

akirakyle commented 4 years ago

@Stevie-O

(unrelated note: check out the code at 0x3C7B)

Well you found me out :) Either the person who wrote this was also named Akira or a huge fan of the anime. First seeing my name at at 0x3FBA definitely motivated me to dig into this more than I otherwise would have.

3DB6 7400 MOV A, #0h ; write 0x00 to 0x0000 ... This writes 023800 - the LJMP 0x3800 instruction - to address zero.

Good find! Although I assume that should be "write 0x00 to 0x0002.

Based on the logic I found in the firmware, this patch might not actually be necessary; it looks like the code is prevented from being written to 0x37FB and above:

Looking at that section I agree. Although I'm not sure that 0x3E00 and 0x3E05 are redundant as written if that's what you mean? Based on everything I've seen so far I do think this was mostly originally C code with maybe some short sections in assembly. it looks to me like 0x3E00-0x3E03 checks for the 'less than' condition while 0x3E05 checks for the equal condition to ensure the address being written is <= 0x37FA so yes this could be done in one one step but the compiler wasn't smart enough to do that (or they didn't turn on that optimization).

This does exactly what I said above: if the destination address is 0x0000-0x0002, it remaps it to 0x37FB.

Indeed it does! The fact this is just before the check to prevent writes to >= 0x37FB makes me all the more convinced they're following the update protocol you initially suspected.

If we assume that the "erased" state of the MCU's flash is 0x00 (not 0xFF, which is much more common), then "writing" 0x00 to a memory location is a no-op. Thus, patching the first byte with 0x00 for the initial write prevents 0x37FB from being written to.

Not sure if it's relevant to this but when the updater converts the hex to binary it is zero filled.

After the bootloader does its thing, it does an LJMP to 0x37FB.

So the next thing to consider for the safety of flashing potentially bad code to the controller without bricking it is to know exactly what the bootloader does before doing an LJMP to 0x37FB. We need to know we can always put it into flashing mode even if we mess up new firmware by, for example, having a bad jump location flashed to 0x37FB. Currently it seems the flasher logic assumes that the controller will be presented as a valid usb endpoint that it can issue a REQUEST_SET_CONFIGURATION to that will ultimately cause the bootloader to present it's usb endpoint and it seems this involves the POR and 'AKIRA' magic data.

By patching the buffer in-memory so we write 0x00 here, we basically prevent a write from being done at all.

That certainly explains why Ayufan thought of that patch as a hack and left that comment in usb_write.c as he wouldn't have known how the controller side handles that write to 0x0000

You should be able to start the write at 0x0001 instead of 0x0000, and avoid having to patch the buffer

I think the usb flashing protocol isn't so sophisticated as to allow starting writes at arbitrary offsets. The only information the host gives the controller during a flash write other than the raw firmware bytes is the min of (highest address in the hex file, MAX_BINLEN=0x3800). I'm not sure that this length info is actually used on the controller since upon starting the last write, which we've found is for writing the real system jump to 0x37FB, the host still sends the controller that same length value while only sending 2048 bytes. Thus patching 0x0000 to zero on the first go at flashing will probably continue to be necessary.

You only have to write 1 byte at the end, not an entire 2048 (you might need to write at least 8 bytes due to certain considerations; you might want to write at least 3 bytes due to other considerations.)

My guess is that 8 bytes is the minimum you should write given that's the packet size of of the usb 1.1 low speed control transfer and so anything less will probably be zero padded by the usb driver. Since the flasher doesn't send the actual length of data to be flashed, the controller would have no way to differentiate padding from data intended to be flashed. Same goes for trying to write firmware to a non 8-byte aligned address.

Stevie-O commented 4 years ago

@Stevie-O

(unrelated note: check out the code at 0x3C7B)

Well you found me out :) Either the person who wrote this was also named Akira or a huge fan of the anime. First seeing my name at at 0x3FBA definitely motivated me to dig into this more than I otherwise would have.

Yeah. I still haven't figured out what it's doing with this stuff, either.

3DB6 7400 MOV A, #0h ; write 0x00 to 0x0000 ... This writes 023800 - the LJMP 0x3800 instruction - to address zero.

Good find! Although I assume that should be "write 0x00 to 0x0002.

Ahh, good catch! Got a little mixed up with the copy-and-paste there.

Based on the logic I found in the firmware, this patch might not actually be necessary; it looks like the code is prevented from being written to 0x37FB and above:

Looking at that section I agree. Although I'm not sure that 0x3E00 and 0x3E05 are redundant as written if that's what you mean? Based on everything I've seen so far I do think this was mostly originally C code with maybe some short sections in assembly. it looks to me like 0x3E00-0x3E03 checks for the 'less than' condition while 0x3E05 checks for the equal condition to ensure the address being written is <= 0x37FA so yes this could be done in one one step but the compiler wasn't smart enough to do that (or they didn't turn on that optimization).

Right, I didn't say that 3E00 and 3E05 were redundant, I said that 3E05 was unnecessary, because it could have been replaced by checking against 0xFB rather than 0xFA at 3E00.

I did say that the first check was made redundant by the second check. The first check tests for addr >= 0x3800 and the second tests for addr > 0x37FA. Since 0x3800 > 0x37FA, the first check is covered by the second. (And they both branch to the same place, L0539).

If we assume that the "erased" state of the MCU's flash is 0x00 (not 0xFF, which is much more common), then "writing" 0x00 to a memory location is a no-op. Thus, patching the first byte with 0x00 for the initial write prevents 0x37FB from being written to.

Not sure if it's relevant to this but when the updater converts the hex to binary it is zero filled.

That on its own doesn't mean anything -- for memory locations that are not read by the firmware, the value written doesn't matter -- but some other things I've found support the "erased to 0x00" thing.

After the bootloader does its thing, it does an LJMP to 0x37FB.

So the next thing to consider for the safety of flashing potentially bad code to the controller without bricking it is to know exactly what the bootloader does before doing an LJMP to 0x37FB. We need to know we can always put it into flashing mode even if we mess up new firmware by, for example, having a bad jump location flashed to 0x37FB. Currently it seems the flasher logic assumes that the controller will be presented as a valid usb endpoint that it can issue a REQUEST_SET_CONFIGURATION to that will ultimately cause the bootloader to present it's usb endpoint and it seems this involves the POR and 'AKIRA' magic data.

You have hit upon one of the problem spots: we're in pretty good shape to prevent bricking the device during a firmware update, but if we put bad firmware on the device, it could become "trapped" there without a way to escape it.

What I've been thinking about is a way to put some code into the main firmware that, early on, checks for a certain combination of keys to be pressed during early firmware startup. Holding those keys down would perform the same process as that of the USB RESET_CONFIGURATION command. Then, if some prototype firmware was so defective you couldn't activate the flash function, you could just hold the keys down, reset the keyboard controller (either via some sort of USB RESET command, or by rebooting the machine), and then let the bootloader take over.

You should be able to start the write at 0x0001 instead of 0x0000, and avoid having to patch the buffer

I think the usb flashing protocol isn't so sophisticated as to allow starting writes at arbitrary offsets. The only information the host gives the controller during a flash write other than the raw firmware bytes is the min of (highest address in the hex file, MAX_BINLEN=0x3800). I'm not sure that this length info is actually used on the controller since upon starting the last write, which we've found is for writing the real system jump to 0x37FB, the host still sends the controller that same length value while only sending 2048 bytes. Thus patching 0x0000 to zero on the first go at flashing will probably continue to be necessary.

Sure it does. Look at this:

  transfer = bytearray([
      0x05, # report id
      0x57,
      0x00,
      0x00,
      length & 0xFF,
      (length >> 8) & 0xFF])

That's your write_block_start function.
0x57 - 'W' write command 0x00, 0x00 - start address in little-endian order: (L0522 / 32CA) stores these into R4:R5 (and R5:R4 is write address)

That's a pretty standard pattern. Command/operation ('W' = write), offset, count. Write bytes starting at 0x0000.

I'm 99% certain that this command is being handled by L0517 (3C0D). That routine checks for a report ID of 0x05, and supports three different operations/commands: 'W' (0x57), 'R' (0x52), and 'V' (0x56). Since pretty much the same code is used for all three commands ('W' actually branches into the 'R' handler.), you should be able to test it by entering the bootloader and issuing a 'R' (read) command. The good stuff is at L0522/3C2A. Note that the byte ordering is little-endian: (0x01, 0x00) reads from address 0x0001, not address 0x0100.

What I can't find is any code that tries to honor the value specified for the length in the setup packet! I'm not sure what the story is there (though I have some guesses.)

You only have to write 1 byte at the end, not an entire 2048 (you might need to write at least 8 bytes due to certain considerations; you might want to write at least 3 bytes due to other considerations.)

My guess is that 8 bytes is the minimum you should write given that's the packet size of of the usb 1.1 low speed control transfer and so anything less will probably be zero padded by the usb driver. Since the flasher doesn't send the actual length of data to be flashed, the controller would have no way to differentiate padding from data intended to be flashed. Same goes for trying to write firmware to a non 8-byte aligned address.

The actual length is communicated. It's just handled at the USB protocol level (layer 4), not inside the commands sent to the bootloader (layer 7).

The MCU can only transfer 8 bytes at a time; the "2048-byte packet" is actually broken into 256 8-byte packets. (In fact, for writes, it's actually 2050 bytes, because the first two bytes are the report ID and the 'w' command.)

I couldn't figure out how the heck it could have worked at all until I found this:

http://www.jungo.com/st/support/documentation/windriver/811/wdusb_man_mhtml/node55.html#SECTION001212000000000000000

Between this and the assembly source, I now understand a lot more.

The diagram in the page I just linked shows that all control transfers involve a SETUP packet, zero or more data packets (either IN, or OUT), followed by a status packet.

Here's the bootloader routine that processes the setup packets:

L0028:
  3C00 AC0C         MOV R4, 0Ch             ; R4 = rxpacket[4] (wIndexL)
  3C02 AD0D         MOV R5, 0Dh             ; R5 = rxpacket[5] (wIndexH)
  3C04 AE0E         MOV R6, 0Eh             ; R6 = rxpacket[6] (wLengthL)
  3C06 AF0F         MOV R7, 0Fh             ; R7 = rxpacket[7] (wLengthH)
  3C08 E50A         MOV A, 0Ah              ; A = rxpacket[2] (wValueL)
  3C0A F512         MOV 12h, A              ; mem[0x12] = rxpacket[2] (not sure why they didn't use MOV iram, iram for this)
                                            ; however, at least one caller uses the value of A

wIndex is used to drive R5:R4, which is where the bootloader keeps track of the target address for memory read/write HOWEVER, it gets overwritten by other code (I suspect they had problems getting things to let them set wIndex to whatever they wanted.)
wLength is used to drive R7:R6, which is where the bootloader (apparently) keeps track of the total number of bytes left to send/receive.
wValue is used to put the bootloader in the correct state for processing commands.

akirakyle commented 4 years ago

Right, I didn't say that 3E00 and 3E05 were redundant, I said that 3E05 was unnecessary, because it could have been replaced by checking against 0xFB rather than 0xFA at 3E00.

I did say that the first check was made redundant by the second check. The first check tests for addr >= 0x3800 and the second tests for addr > 0x37FA. Since 0x3800 > 0x37FA, the first check is covered by the second. (And they both branch to the same place, L0539).

Sorry I misunderstood what you were getting at there.

You have hit upon one of the problem spots: we're in pretty good shape to prevent bricking the device during a firmware update, but if we put bad firmware on the device, it could become "trapped" there without a way to escape it.

What I've been thinking about is a way to put some code into the main firmware that, early on, checks for a certain combination of keys to be pressed during early firmware startup. Holding those keys down would perform the same process as that of the USB RESET_CONFIGURATION command. Then, if some prototype firmware was so defective you couldn't activate the flash function, you could just hold the keys down, reset the keyboard controller (either via some sort of USB RESET command, or by rebooting the machine), and then let the bootloader take over.

That still seems like it would be risky to try out as you'll have to be really confident that you get that code right the first time. I feel like there has to be some mechanism in the bootloader that allows one to force it into flashing mode even if theres a valid jump at 0x37FFB. There's all this logic right at the start of 0x3800 around testing for the nature of the reset (somehow involving testing and/or placing the "AKIRA" string at 0x20 in memory) and a lot of waiting logic around the state of P4_5 and P4_6 (the usb D+ and D- pins). It might just be a matter of bit-banging those USB pins on the host side to force a certain state which will always enters the bootloader's flashing code.

Sure it does. Look at this: ...

You're probably right about the protocol for the usb control transfer exchange as I didn't dig very deep into the firmware side of it, which obviously leaves me guessing what the magic bytes in the usb packets might mean so I figured the zeros were' just padding. Actually before starting this I knew nothing about usb transfer protocols and I read Jan Axelson's USB Complete to try to understand the usb logic going on. I guess I should go back and review the control transfer protocol as it seems I may have some misunderstandings but unfortunately I don't have much time to dig into this anymore as it was a bit of a quarantine project for me but I'm happy to help out when I get the chances to! If we eventually get to the point of writing some firmware, it might be worth digging into system76's ec firmware since their laptops use an 8051 chip for keyboard control whereas I think @jackhumbert's QMK only targets AVR.

Stevie-O commented 4 years ago

@akirakyle

What I've been thinking about is a way to put some code into the main firmware that, early on, checks for a certain combination of keys to be pressed during early firmware startup.

That still seems like it would be risky to try out as you'll have to be really confident that you get that code right the first time. I feel like there has to be some mechanism in the bootloader that allows one to force it into flashing mode even if theres a valid jump at 0x37FFB.

What I had laid out was a plan that was basically of the form:

Add a test routine to verify that you can detect the right key presses (maybe by blinking the lights)
Add a special key combination, such as Pine+F1, that branches to that routine
Press Pine+F1 and see if the routine works

If it doesn't work, simply reboot the machine and try again. If it DOES work, then you should be able to splice a similar routine into the boot process.

There's all this logic right at the start of 0x3800 around testing for the nature of the reset (somehow involving testing and/or placing the "AKIRA" string at 0x20 in memory) and a lot of waiting logic around the state of P4_5 and P4_6 (the usb D+ and D- pins). It might just be a matter of bit-banging those USB pins on the host side to force a certain state which will always enters the bootloader's flashing code.

I actually spent an evening deciphering all of that logic. A secret combination of pulses on P4_5 and P4_6, executed with careful timing, will put the bootloader into a special mode where it will accept, using the I2C protocol over those pins, various commands. One of those commands is "erase firmware".

I never brought it up here because I didn't see how it could be useful inside an actual Pinebook. The pins have to be held a certain way immediately after power-on reset (the code checks POF in the reset-condition register). You need the USB port (USB0 on the Rockchip) to which it's directly wired to not drive D+ or D- at all -- they need to be in high-impedance mode. I don't see any way to make it happen, really, especially not without severe risk of frying a chip.

You'd be better off trying to do ICSP, since all of those pins seem to exposed via the keyboard connector.

swiftgeek commented 3 years ago

If somebody has JET51, it would be great if they sniffed JTAG during programming, and posted that somewhere

FX2LP could be definitely used for sniffing, at really low cost

gashtaan commented 1 year ago

@swiftgeek Hi, after some googling I find myself here. There is JET51 firmware as part of the Keil Driver Install Package which is free to download at: https://en.sinowealth.com/seach?type_id=68&a_v_type=1 ...so maybe there is no need to have access to physical programmer device, just reverse engineer JTAG protocol from it. It seems quite complex. but I think it's doable. Unfortunately I don't have spare device witch such MCU to experiment with.

gashtaan commented 1 year ago

I bought keyboard Genesis Thor 300 with SH68F881W MCU inside just to do experiments. I was able to reverse engineer the protocol and then using this knowledge, I have successfully downloaded the firmware from it.

swiftgeek commented 1 year ago

@gashtaan is that effort described anywhere? I have a SH68F88 based keyboard with dedicated JTAG connector

Datasheets mention some kind of JTAG pin-stimulation sequence on reset that is required to enable JTAG, and that's probably the worst part about it

gashtaan commented 1 year ago

@swiftgeek Not yet, I'll try to publish some tool on my github page.

Yes, to establish a connection, a lot of various pulses need to be generated to TMS,TCK,TDI pins within first 20ms since powering the chip on. Then the connection can be switched between two modes (Boot-ROM specific and JTAG), each one has its own protocol... reverse engineer these is the worst part about it.

gashtaan commented 1 year ago

@swiftgeek Here: https://github.com/gashtaan/sinowealth-8051-dumper

Hamza-beta commented 3 months ago

I bought keyboard Genesis Thor 300 with SH68F881W MCU inside just to do experiments. I was able to reverse engineer the protocol and then using this knowledge, I have successfully downloaded the firmware from it.

Hello do you have datasheet for the mcu i couldn't find one on the internet and thank you

gashtaan commented 3 months ago

Hello do you have datasheet for the mcu i couldn't find one on the internet and thank you

Unfortunatelly no, I didn't find it either. Nevertheless, I think that all relevant info can be extrapolated from other datasheets and Keil include files.

swiftgeek commented 3 months ago

@Hamza-beta I think SH79F6489 was the closest one, check https://github.com/swiftgeek/hykker-re/issues/5 for details