unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.53k stars 1.33k forks source link

Add a uc_reg_size API #1831

Open nneonneo opened 1 year ago

nneonneo commented 1 year ago

Currently, there's no way to know how much memory to allocate for uc_reg_read or uc_reg_write. For writing bindings or writing more "generic" code, it would be very useful to know how big a particular register is before performing a read/write operation.

This solves a few issues that I can see:

Of course, the downside is maintaining a new reg_size for each arch. But, since we already have to maintain reg_read and reg_write, this should hopefully be an acceptable tradeoff.

I'm happy to take this and work on it; I'm putting this issue out to see if it would be worth doing in the first place (i.e. would be accepted into Unicorn).

wtdcode commented 1 year ago

This improves binding safety. Bindings can verify that they are using (or being passed) a large-enough buffer to read any given register. This reduces bugs when bindings don't support new registers. Bindings are not always updated to match the latest Unicorn code, and many bindings (e.g. Java, Python) fall back to reading 64-bit values on unknown registers. https://github.com/unicorn-engine/unicorn/issues/1539 is an example of a bug caused by using a 64-bit read on a 128-bit register unknown to the binding: the read silently overflows the stack buffer, causing a mysterious crash. Having something like uc_reg_size would allow a binding to detect that the register was too large for the requested read.

I agree with this point because uc_reg_read/write currently is on a big mess on typing and it's a design flaw.

On big-endian hosts, doing something like uint64_t value; (uint16_t )value = ...; return value; will put the value in the most-significant bytes of the value. Ideally, Unicorn works equally well on big-endian as little-endian hosts.

This will also introduce some break changes but I think not too many users write code to handle this or actually run on big endian platforms. Maybe this is fine.

Bindings can adopt a more generic approach to reading or writing registers. Future larger SSE registers, for example, can be handled transparently by allocating a buffer of the appropriate size. This reduces the amount of special-casing in the bindings and therefore reduces maintenance burden.

This depends because it could only happen if we update QEMU version and that should always mean a big step and we won't do this too often.

Regarding the API itself, I personally don't think it's the right design because it has nothing to do with the emulation itself. Instead, I will propose adding an alternative API (same applies to writing) like uc_reg_read2(uc, reg, buffer, size) to uc_reg_read(uc, reg, buffer), though it looks ugly. It should be the best choice with regard to compatibility.

But note that, users could always maintain such register size by themselves. In other words, such ability is not really essential here in my opinion but I understand the motivation to make the API easier and more importantly, safer to use.

@aquynh Are you okay with an alternative uc_reg_read/write API? If so, I can also help on working on this.

aquynh commented 1 year ago

sounds ok to me, thanks for spending your time improving this.

nneonneo commented 1 year ago

ok, I’m working on this. I’ve finished a prototype implementation, and am now working on performance optimization.