unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.67k stars 1.35k forks source link

riscv: Invalid 32-bit instruction should not decrement pc #1991

Closed apparentlymart closed 2 months ago

apparentlymart commented 3 months ago

Note: I'm not sure if this is QEMU's bug or Unicorn's bug. I wasn't able to find a QEMU commit which had the line I removed here and so I suspect this was added in Unicorn, but I'm not sure which version of QEMU the current Unicorn was derived from and history shows this line being added as part of the Unicorn2 import mega-commit aaaea14214ed46ac60cf7ef5766374f660b05777.

If this is QEMU's bug then it it appears to have been fixed upstream already but the relevant part of this code has been changed significantly to support dynamic registration of ISA extension decoders. QEMU code from just before that redesign already lacks the line of code I removed, and I can't find any other earlier commit that contains it before I reach code that is clearly earlier than the snapshot Unicorn2 adopted.

I'm not sure what the policy is for patching the QEMU code directly instead of importing a new version from upstream; I'll understand if this patch is unwanted due to making future QEMU backports harder, but unfortunately this problem is blocking for me because my program relies on being able to detect invalid instruction exceptions but currently Unicorn is misreporting an instruction page fault at an unrelated address instead.


The line I've removed appears to be trying to undo the effect of adding 4 to pc above, but does so incorrectly and so ends up returning with next_pc earlier than it was prior to decoding.

This causes the translator to malfunction because it does not expect pc_next to decrease during decoding: this is effectively reporting that the invalid instruction has a negative size, which is impossible. The decoder uses the increase in next_pc to decide the translation block size, but converts it to uint16_t thereby causing a block containing only an invalid instruction to be treated as having size 65532 (reinterpreted -4) and therefore the translation loop tries to find the next translation block at 65532 bytes after the invalid instruction, which can cause a spurious instruction access/page fault if the page containing that address is not mapped as executable.

In practice we don't need to readjust the pc at all here because it is correct to report that the invalid instruction is four bytes long. This allows the translation loop to correctly find the next instruction, and to avoid producing spurious TLB fills that might cause incorrect exceptions.


The following is the assignment that calculates the translation block size from the difference between pc_next and pc_first after the translator reports the end of a block:

https://github.com/unicorn-engine/unicorn/blob/d4b92485b1a228fb003e1218e42f6c778c655809/qemu/accel/tcg/translator.c#L147

The size field is uint16_t so this causes unsigned integer overflow if pc_next is less than pc_first, which is the situation caused by the line I removed in this PR.

The spurious exception occurs in the caller where it tries to find the physical address of the next instruction. The incorrect size value causes the incorrect conclusion that the next instruction is in a different page and causes a spurious TLB probe which an then raise a spurious instruction access or page fault:

https://github.com/unicorn-engine/unicorn/blob/d4b92485b1a228fb003e1218e42f6c778c655809/qemu/accel/tcg/translate-all.c#L1706-L1711

After my change size is 4 and so this logic then probes the correct address, which in my case belongs to the same page as the invalid instruction and so avoids calling get_page_addr_code at all.

If my 4-byte invalid instruction were straddling a page boundary then the end would be in a different page than the start, but in that case any access/page fault for the upper half would already have been detected when loading it in the riscv-specific instruction decoder, and so the get_page_addr_code call in the snippet above would still not be reachable:

https://github.com/unicorn-engine/unicorn/blob/d4b92485b1a228fb003e1218e42f6c778c655809/qemu/target/riscv/translate.c#L768-L769

(The minimum valid alignment for a RISC-V instruction is two bytes, so all other situations would cause "instruction address misaligned" to have occurred long before this code is running.)

apparentlymart commented 3 months ago

From the changelog I learned that Unicorn2 was based on QEMU v5.0.1, and indeed the line I removed here is not present in the tag for that release: https://github.com/qemu/qemu/blob/v5.0.1/target/riscv/translate.c#L737-L739

Therefore I guess this was added as part of the modifications to adapt QEMU to work for Unicorn2, and so perhaps something else in Unicorn is somehow relying on this behavior but I'm not sure how to prove or disprove that. :thinking:

I don't think returning a next_pc that's less than before decoding can possibly be correct because (as described above) the translator code treats the translation block size as an unsigned integer, but if this line was added to Unicorn to solve a problem somewhere else then I'd be happy to try to find some other way to solve that problem, if maintainers would prefer (and can describe what that problem is).

For what it's worth, all of the RISC-V unit tests still seem to work after my change, although I don't think any of those tests actually exercise the invalid instruction handling. I am now getting the behavior I expect in my larger project where I originally discovered this problem: I have an interrupt hook but no invalid instruction hook and the interrupt hook gets called to report an illegal instruction exception (cause 2).

wtdcode commented 3 months ago

From the changelog I learned that Unicorn2 was based on QEMU v5.0.1, and indeed the line I removed here is not present in the tag for that release: https://github.com/qemu/qemu/blob/v5.0.1/target/riscv/translate.c#L737-L739

Therefore I guess this was added as part of the modifications to adapt QEMU to work for Unicorn2, and so perhaps something else in Unicorn is somehow relying on this behavior but I'm not sure how to prove or disprove that. 🤔

I don't think returning a next_pc that's less than before decoding can possibly be correct because (as described above) the translator code treats the translation block size as an unsigned integer, but if this line was added to Unicorn to solve a problem somewhere else then I'd be happy to try to find some other way to solve that problem, if maintainers would prefer (and can describe what that problem is).

For what it's worth, all of the RISC-V unit tests still seem to work after my change, although I don't think any of those tests actually exercise the invalid instruction handling. I am now getting the behavior I expect in my larger project where I originally discovered this problem: I have an interrupt hook but no invalid instruction hook and the interrupt hook gets called to report an illegal instruction exception (cause 2).

I agree with you. I need to check the git logs before the unicorn merge.

wtdcode commented 2 months ago

Thanks!