vladignatyev / GBA4iOS

GBA4iOS mirror working on iOS 8. It's an emulator of GameBoy Advance console for Apple iPhone gadget
http://builds.io/apps/gba4ios-21/
109 stars 44 forks source link

Feature request: Dynamic binary translation #2

Open ryao opened 8 years ago

ryao commented 8 years ago

My iPhone 6 Plus has a 4 hour battery life when running GBA4iOS while the original GBA hardware was rated for 15 hours. The iPhone 6 Plus does over 10 hours in just about any other task. I suspect that GBA4iOS could be improved to obtain that level of battery life too.

I have not read the codebase in detail, but it looks like a wrapper for the emu-ex-plus-alpha emulator, which appears to be emulating an ARM CPU by interpreting the code:

https://github.com/vladignatyev/GBA4iOS/blob/master/emu-ex-plus-alpha/GBA.emu/src/vbam/gba/GBA.cpp#L3232

Implementing dynamic binary translation might be enough to improve battery life to match other tasks on the iPhone.

While the iPhone typically does not permit execution of unsigned code, I have read that it allows the first mmap() to request execute permissions for the Nitro Javascript engine used by Mobile Safari to do JIT compilation of Javascript into native code. The same should also apply for WKWebView, which should also use Nitro. That should make dynamic binary translation possible.

ryao commented 8 years ago

After some light reading on this, it seems that GBA game code itself likely busy-waits on conditions such as vblank, which is energy inefficient:

http://patater.com/gbaguy/gba/ch5.htm

That was not a big deal on the GBA because the embedded CPU used so little power, but it should be a bigger problem on the iPhone where its CPU uses far more power and doing a busy-wait is just incredibly inefficient. If dynamic binary translation were implemented, busy-waiting would negate any CPU time saved by dynamic binary translation.

It would probably be better to get some real world examples of busy-wait loops in the game code by profiling the emulated CPU for them to obtain some real world examples, putting code into the emulator to detect them and then blocking until a change occurs. That way busy-wait loops on things like vblank block rather than iterating as fast as the CPU will go. It would be better to block until the loop condition is actually true, but that might be a pain to implement.

Assuming emulated devices are independent threads, maybe something like this would work:

  1. Two global flag variables, a last jump address variable, a global counter, a last seen counter state variable, a mutex and a condition variable. The global flag variables would indicate whether a write to memory has been done since the last jump and whether a read to device memory has occurred since the last jump. The global counter would indicate the number of times a device has written to memory. The mutex would protect the counter. The last seen variable would state the last counter state encountered when reading device memory. Lastly, the condition variable would allow for block.
  2. On each change of memory state by a CPU instruction, set the flag variable to 1.
  3. On each read of device memory, lock the mutex, perform the read, update the last seen counter state variable with the counter state and then unlock it. Also, set the read of device memory flag.
  4. On each update of device memory, lock the mutex, perform the update, unlock and perform a condition variable signal/broadcast.
  5. On each jump, compare the target with the target of that last jump. If they are equal, the updated memory state flag is set to zero and the read of device memory flag is set to 1, lock the mutex and check if the state has changed. If the state did not change, block on the condition variable. In either case upon continuing, unlock (as returning from the condition variable locks the mutex), set the flag variables to zero before processing the instruction after the jump and store the new target address.

That should detect loops that busy-wait on hardware conditions and make them blocking. It would come at the expensive of making memory operations and jumps more expensive on the emulated CPU, but this should not be too much of a problem because memory operations on the original hardware should have been expensive anyway. The reduction of time spent spinning should make up for it unless the emulated CPU is so busy that the spinning barely makes a difference (in which case, we would probably have delayed frames).

vladignatyev commented 8 years ago

@ryao Thanks for your suggestions and proposal for changes! At this time, this repo is my own experimental repo, so I recommend you to add this into official @rileytestut repo located on BitBucket here: https://bitbucket.org/rileytestut/gba4ios/