unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.33k stars 1.31k forks source link

Re-align Unicorn register enum values with updated Capstone register enum values #1925

Closed Montg0mery closed 4 months ago

Montg0mery commented 4 months ago

Capstone 5 changed the value of various registers compared to Capstone 4. The commit history in Capstone suggests that the changes are to bring in updated from LLVM.

E.g. with Capstone 4, the Aarch64 register X20 for example had the value 219: https://github.com/capstone-engine/capstone/blob/46a74e53b7c1eefbd71857bfe6874f8437647fdd/arch/AArch64/AArch64GenRegisterInfo.inc#L236

Unicorn 2 has the same value when evaluating the enum: https://github.com/unicorn-engine/unicorn/blob/d4b92485b1a228fb003e1218e42f6c778c655809/include/unicorn/arm64.h#L261

But now with Capstone 5, the value of X20 is 239: https://github.com/capstone-engine/capstone/blob/6a55ef3bdadf606b79a954adad80a90913e6b519/arch/AArch64/AArch64GenRegisterInfo.inc#L257

This means if you have a project which uses both Capstone 4 and Unicorn 2, and passes or compares values between them, it used to work, but doesn't if you use the newest version of Capstone.

As such, as well as the values not aligning, new register information is missing from Unicorn.

Please can Unicorn be brought back in sync with Capstone in this regard?

gerph commented 4 months ago

I'm not convinced that aligning these makes sense; if Unicorn were brought in line with Capstone 5 then users of Unicorn that require Capstone 4 (because Capstone 5 was incompatible/had breakages/couldn't be upgraded for other reasons), who assumed that Unicorn and Capstone 4 had aligned registers would find that things broke with the upgrade of Unicorn.

Whilst it might be nice to have the two systems use the same value, since they have already diverged it is already clear that the behaviour of relying on two independent projects having the same constants was unreliable at best. The problem you mention is really showing the fact that you can't rely on the fact that the two projects happen to have used the same constant values to mean the same thing is a guaranteed behaviour. If it were a guaranteed behaviour, then the change in Capstone would have been a bug. Both the case that I suggest (a Unicorn user relying on the direct mapping to Capstone 4 constant values), and the case that you suggest (a Unicorn user relying on the direct mapping to Capstone 5 constant values) cannot both be true - and since Unicorn does not require that you use a specific version of Capstone, a program relying on such behaviour is not using the libraries properly.

And if the behaviour of Capstone is tied to that of LLVM then this means that Capstone's register mapping is subsequently at the whim of the LLVM developers and their definition of the register mappings. So if LLVM changed its mappings, Capstone would change, and thus force a major version increment to ensure binary compatibility, which then would mean that the Unicorn project would need to change its mappings to match the newly updated Capstone mappings.

What I'm trying to say is that by tying the constant values of each project together and assuming that you can pass Unicorn::UC_ARM64_REG_X20 where a Capstone register is expected is wrong. And trying to enforce that it always be true means that the version of LLVM that builds the Capstone register bindings must be kept in lockstep with the version of Unicorn if you want to use that version of Capstone. That is, Unicorn X.. will only work with Capstone X.. and nothing else if you rely on passing those values interchangeably. Which isn't the way I expect the libraries to be used. I expect that I can use Unicorn X with Capstone Y, so long as I convert between the types required.

Sorry, that might be a lot of words just to say "I don't think this can work".

wtdcode commented 4 months ago

Closing this thanks to the excellent explanation from @gerph . I also had a few comments on it.