zku / SuperHaxagon

Super Hexagon Bot
7 stars 0 forks source link

Python Bot Question #1

Closed ghost closed 10 years ago

ghost commented 10 years ago

First I'd like to apologize for creating an issue to get in contact with you, but I couldn't find another way to do so.

I've been learning python and I've also been playing Super Hexagon recently, so I figured I would try my hand at creating a bot for it.

I started out trying to use SimpleCV to parse the shapes and control the player position based on that, but SimpleCV just can't do it fast enough. It's only able to screenshot the game and parse the positions of everything at 6 FPS which is not fast enough to keep up with the speed of Super Hexagon.

I've decided to try a similar technique to what you've done by reading the memory but I've never done something like this with python.

I've found this about reading memory in python: http://stackoverflow.com/questions/1794579/how-can-i-read-the-memory-of-another-process-in-python-in-windows

I was wondering if you have any insight as to whether this would work or not. If so, I would assume that on the stackoverflow page that "pid" would be the superhexagon process and would the "address" location be the offsets that I've seen in your bot?

ghost commented 10 years ago

So that stack overflow code does work, but it seems that either I'm trying to put the wrong memory addresses in or the base pointer and offsets in your code don't work for me.

For instance, the base pointer in your code is: 0x694B00 and the offset for number of slots is: 0x1BC which would give a location of: 0x694CBC

However, when I went through a memory scanning process to find the number of walls, it was located at the address: 0x3A69574

So now I'm a bit lost as to how to find the memory locations of the other pieces of information I would need such as PlayerAngle, WorldAngle, FirstWall, etc...

ghost commented 10 years ago

I did some research on memory and pointers since I had not worked with them before and was finally able to figure out what I was doing wrong.

zku commented 10 years ago

Hey,

Sorry I didn't reply earlier; I totally missed it. The number of walls is in several memory locations, so it is very likely that you will find it at a different address. Unless the game received a patch, these addresses should still work though.

As I read it, it seems to be working for you now? Otherwise, feel free to reopen this issue (or create a new one) to ask away.

ghost commented 10 years ago

Yes, they are working for me now, I was using them in an incorrect way at first. Do you happen to know a way to get the base address of a process in python? I currently have to open Super Hexagon, open the process in cheat engine, and put the base pointer (694B00) in to get the base address which I manually put into my python program before I run it.

I found this stackoverlfow page, but it seems GetModuleHandle only works if the calling process also loaded the module in question. http://stackoverflow.com/questions/13045864/python-how-to-get-the-start-base-address-of-a-process

zku commented 10 years ago

What do you need the base address of the Super Hexagon process for? The superhexagon.exe binary I get from Steam has ASLR disabled, so it should always be loaded at the same address.

In any case, you can use the Toolhelp32 functions to get that information. Functions of interest are CreateToolhelp32Snapshot, Process32First, Process32Next, Module32First and Module32Next.

With these Windows API functions, you can iterate over all processes to find the target process, and then iterate over all modules in that process to find the superhexagon.exe module.

ghost commented 10 years ago

Well I suppose I'm still a bit confused about the subject then. When I close and reopen Super Hexagon sometimes the memory addresses change. For instance, when I was searching for NumSlots, the first address I got was like 03A69574 but then a day or so later I was working on my python bot and the address wasn't working, when I went to scan for it again I discovered it was no longer at that address.

zku commented 10 years ago

Oh I see. Indeed, these things are usually dynamically allocated and may be at different addresses in successive executions.

However, ASLR (address space layout randomization) is disabled and the game stores a reference to the 'app' at a known location that will never change.

You can do something along these lines (pseudocode):

appBase := ReadInt32(0x694B00) # the value of 'appBase' might be different after every execution, but we don't care; we know that at appBase + offset we can find what we want numSlots := ReadInt32(appBase + 0x1BC) print(numSlots)

You will always need to perform the read from address 0x694B00 to find where the 'app' has been allocated, and then use the other offsets relative to that read out value (appBase).

Instead of attaching Cheat Engine and reading out the value that you get from 0x694B00, you can do that in your bot, and then use this as the app base to get the information you need.

If this doesn't clear things up I'll try to whip up some python code to illustrate this better.

ghost commented 10 years ago

So is reading the base pointer (0x694B00) supposed to be done with the ReadProcessMemory function? Because I was under the impression that the pointer was only accessible inside Super Hexagon's address space. (I've never worked with memory directly before, or lower level programming languages.)

zku commented 10 years ago

Yes - ReadProcessMemory can be used to read memory from other processes.

If our code were to be executed inside Super Hexagon process, we could simply dereference the pointer to read out the value it points to. Since we are outside of the process though, there is no way for us to directly interact with a different process' virtual memory - that would be catastrophic. Windows provides us with an interface to do this though: Read/WriteProcessMemory (assuming we have the proper privilege levels).

Maybe this will help you:

from ctypes import *
from ctypes.wintypes import *
from time import sleep

OpenProcess = windll.kernel32.OpenProcess
ReadProcessMemory = windll.kernel32.ReadProcessMemory
FindWindowA = windll.user32.FindWindowA
GetWindowThreadProcessId = windll.user32.GetWindowThreadProcessId

#
#  Warning: absolutely no error checking, leaking handles, ..
#

# Find the Super Hexagon window by title (any window with that title might be returned):
windowHandle = FindWindowA(0, 'Super Hexagon')

# Get the process id from the window handle:
processId = c_int()
GetWindowThreadProcessId(windowHandle, byref(processId))

# Open process for reading & writing:
processHandle = OpenProcess(0x0010 | 0x0020 | 0x0008, 0, processId.value)

# Read out the app base (this is not the program / module base, the 'app' is just a huge object):
appBase = c_int()
numRead = c_int()
ReadProcessMemory(processHandle, 0x694B00, byref(appBase), 4, byref(numRead))

# Read out player angle:
playerAngle = c_int()
while True:
    ReadProcessMemory(processHandle, appBase.value + 0x2958, byref(playerAngle), 4, byref(numRead))
    print 'Player angle: %3d' % playerAngle.value
    sleep(1.0)

As a reference, in my original code I read out the appBase with ReadProcessMemory (well, a function that wraps around it) as well: https://github.com/zku/SuperHaxagon/blob/master/SuperHaxagon.cpp#L146

ghost commented 10 years ago

Ah I see now, it makes sense now. Thank you very much for your help, I think I should be good to go now.

zku commented 10 years ago

Wonderful, good luck!

(and keep me posted on your bot progress if you don't mind; I'd love to see one that doesn't cheat / teleport around like mine does)

ghost commented 10 years ago

Will do. I really wish I could have gotten SimpleCV to work, it seemed like a harder challenge than reading memory values. It was recognizing the center polygon, player, and lanes, but was only doing it at 6 FPS.

ghost commented 10 years ago

I'm a bit confused as to what information numWalls and firstWall give, I was watching the memory values as I played the game and I couldn't understand what they mean. For instance, a shape would appear and numWalls would change to something like 20.

zku commented 10 years ago

firstWall is the first loaded wall that can be read out from memory; it says nothing about the walls position or distance, and numWalls tells you how many walls are currently loaded in memory. Note that the actual game field is much larger than what is visible, and more walls are loaded than you can see. I hope this picture clears some things up: https://www.dropbox.com/s/b3arjxy41hbtnea/vis.png

I've added some comments in the source file about the width of a wall. Also note that the width of a wall starts out at some fixed size (per wall), and once the wall reaches the center, it slowly loses width until it has width 0 (or a negative value). Same goes for the distance: the wall spawns at some distance from the center and slowly goes towards 0 (or negative values). A wall with a distance <= 25 (or so, don't remember exactly) will not cause any collisions anymore, and neither will a wall with a width of <= 0.

I wrote a small script in MATLAB to illustrate this by taking a snapshot of the entire game state every couple of milliseconds that could further illustrate this, but I can't provide a recording of this right now / this weekend, so if you are still stuck I'll try to upload it sometime Sunday evening.

ghost commented 10 years ago

I think most of my confusion is just coming from what data type I should be using to read/store the walls.

In your code you use struct Wall and then you make it into a vector on line 141 with std::vector<Wall> walls;. Would the analogous python type be a list?

The only other thing I don't fully understand is line 241: std::vector<DWORD> minDistances(numSlots, -1);

I understand it's creating an array of DWORD types, but what is minDistances? A function of some sort?

I apologize for all my questions, this is my first time reading C++ code. I'm in no hurry for a response, so please take your time if you're busy.

zku commented 10 years ago

In python, you could define your own data structure for the walls and just perform large enough reads and manually convert the bytes to the respective fields (mostly just ints). You could also make use of the ctypes library and define and define the structure layout accordingly: http://docs.python.org/2/library/ctypes.html#ctypes.Structure

Note that I don't make it into a vector on line 141; I declare a vector (think array for now) of multiple walls.

The minDistances is the name of the vector, similar to the construct you saw above. I simply provide the object constructor with more information: I tell it to create a vector of size numSlots and have them all initialized to -1. I use that vector to find out how close the nearest wall is in every slot.

Here is the MATLAB illustration I mentioned: https://www.dropbox.com/s/pcrij4unir1rscz/illustration.mp4

ghost commented 10 years ago

I've just gotten back around to working on my project and I was going to implement the portion that would read/store the walls. Like you suggested above, I was going to make a data structure for the walls and convert the bytes to their respective fields, but I'm not sure how I should determine the size of the read to use.

In your code on line 165 you use sizeof(Wall) * numWalls) but I don't know if python has a function similar to sizeof.

I assume since you say that the size of Wall should be 0x14 bytes total that I could just multiply the decimal number 20 by the value from numWalls()?

The last question I have is what the purpose of some of the fields in your Wall struct are for. Your Wall struct is defined as:

struct Wall
{
    DWORD slot;
    DWORD distance; // Should be signed integer as well!
    BYTE enabled; // Actually, these 4 bytes make up a signed integer containing the width of a wall!
    BYTE fill1[3]; // Actually, these 4 bytes make up a signed integer containing the width of a wall!
    DWORD unk2;
    DWORD unk3;
};

You showed me in an illustration that explained some of it, but why are there two fields that both seem to hold the width? Is it that you only use the enabled field and fill1[3] is just to make it into 4 bytes?

I assume unk2, unk3 stand for 'unknown' and that you don't use them except for the purpose of getting the correct size from the Wall struct?

zku commented 10 years ago

Yes, you could read it all out in one big read of size numWalls * 0x14. The reason for the unk2 and unk3 fields is exactly that: I didn't bother finding out what they are as they were irrelevant for my purposes - I simply know the full size of the wall structure because that was evident from reversing the code.

About enabled/fill1 and width: I misinterpreted the data in the struct. There is no such thing as an enabled flag; it's actually the width (which is 4 bytes long, a signed integer). So instead of having a 1-byte enabled flag and 3 filler-bytes, we just have the 4 bytes long width.

ghost commented 10 years ago

I'm having some trouble with reading the walls data with the large enough read we were talking about. My current code is here: https://github.com/Darkman802/super-hexagon-bot/blob/master/bot.py

I call the read_walls() function on line 82. It calculates the read length by doing 0x14 * self.get_num_walls() and passes that length with the memory address to the read_bytes function on line 36.

The problem I'm having is that ReadProcessMemory is only returning the current value of first_wall.

I have a function called show_data on line 161 that I was using to try and show some information that might be helpful for me to debug the problem. Each output has three lines:

  1. Game information
  2. memory address, read length
  3. data returned by read_walls, the list that would have the large read split into separate walls

    It was outputting the following while I played the game for a few seconds:

{'num_slots': 6, 'world_angle': 112, 'player_slot': 0.7, 'player_angle': 44, 'num_walls': 9, 'first_wall': 5}
60845432 180
b'\x05' [b'\x05']

{'num_slots': 6, 'world_angle': 80, 'player_slot': 0.7, 'player_angle': 44, 'num_walls': 13, 'first_wall': 5}
60845432 260
b'\x05' [b'\x05']

{'num_slots': 6, 'world_angle': 49, 'player_slot': 0.7, 'player_angle': 44, 'num_walls': 14, 'first_wall': 5}
60845432 280
b'\x05' [b'\x05']

{'num_slots': 6, 'world_angle': 18, 'player_slot': 0.7, 'player_angle': 44, 'num_walls': 14, 'first_wall': 5}
60845432 280
b'\x05' [b'\x05']

{'num_slots': 6, 'world_angle': 346, 'player_slot': 0.7, 'player_angle': 44, 'num_walls': 14, 'first_wall': 0}
60845432 280
b'' []

{'num_slots': 6, 'world_angle': 318, 'player_slot': 1.9, 'player_angle': 114, 'num_walls': 14, 'first_wall': 0}
60845432 280
b'' []

{'num_slots': 6, 'world_angle': 284, 'player_slot': 2.8, 'player_angle': 170, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 253, 'player_slot': 3.3, 'player_angle': 198, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 222, 'player_slot': 3.5, 'player_angle': 212, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 190, 'player_slot': 3.5, 'player_angle': 212, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 159, 'player_slot': 4.8, 'player_angle': 289, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 128, 'player_slot': 5.9, 'player_angle': 352, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 149, 'player_slot': 1.2, 'player_angle': 69, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 181, 'player_slot': 2.2, 'player_angle': 132, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 212, 'player_slot': 5.8, 'player_angle': 345, 'num_walls': 25, 'first_wall': 0}
60845432 500
b'' []

{'num_slots': 6, 'world_angle': 243, 'player_slot': 5.5, 'player_angle': 331, 'num_walls': 13, 'first_wall': 0}
60845432 260
b'' []
zku commented 10 years ago

Hmhm, do you ever adjust the buffer length? self.buffer = c_char_p(b"data buffer")

I don't have too much experience with the ctypes module, but I'm assuming this buffer won't automagically resize, in which case it will only be able to hold a couple of bytes (len("data buffer")). Also, c_char_p is probably the wrong data type for this:

class ctypes.c_char_p
Represents the C char * datatype when it points to a zero-terminated string. For a general character pointer that may also point to binary data, POINTER(c_char) must be used. The constructor accepts an integer address, or a string.
ghost commented 10 years ago

Do you have any suggestions for the correct data type, or a different approach to use? I'm not sure where to go from here.

zku commented 10 years ago

This seems to work. Disclaimer: I don't really know how to use the ctypes module. No idea if these ctypes data types leak if they aren't freed explicitly.

#!/usr/bin/env python

import ctypes
from ctypes import windll
from struct import pack, unpack
from time import sleep

OpenProcess = windll.kernel32.OpenProcess
ReadProcessMemory = windll.kernel32.ReadProcessMemory

def ReadBytes(processHandle, address, size):
    localBuffer = (ctypes.c_ubyte * size)()
    numRead = ctypes.c_uint(0)
    ReadProcessMemory(processHandle, address, ctypes.byref(localBuffer), size, ctypes.byref(numRead))
    return ''.join(pack('<B', localBuffer[i]) for i in xrange(size))

def main():
    processId = 6868
    processHandle = OpenProcess(0x1f0fff, False, processId)
    appBase = unpack('<I', ReadBytes(processHandle, 0x694B00, 4))[0]

    while True:
        sleep(0.5)
        numWalls = unpack('<I', ReadBytes(processHandle, appBase + 0x2930, 4))[0]
        for i in xrange(numWalls):
            # Read wall for wall. Could also read the entire thing in one big chunk
            # and then get the wallData from chunk[i*0x14:i*0x14+0x14].
            wallData = ReadBytes(processHandle, appBase + 0x220 + i * 0x14, 0x14)
            slot     = unpack('<i', wallData[0: 4])[0]
            distance = unpack('<i', wallData[4: 8])[0]
            width    = unpack('<i', wallData[8:12])[0]
            print('[wall %2d] slot: %d, distance: %d, width: %d' % (i, slot, distance, width))

if __name__ == '__main__':
    main()
ghost commented 10 years ago

I have it working to some extent, but it seems to make the wrong choice sometimes (usually on the same type of wall pattern).

My logic that determines the target slot is here: https://github.com/Darkman802/super-hexagon-bot/blob/master/bot.py#L162

The last check of the if block on line 168: -1 < wall['slot'] < (num_slots - 1) was because some of the walls were saying they were in slot number 22, which is a bit weird.

ghost commented 10 years ago

I noticed something extra about the walls that the bot hits. On the first level (hexagon) it usually hits the walls that have a brighter color to them. If it's about to hit a wall and the wall changes color to be a bit darker like the other walls, the bot will immediately change slots to avoid it.

I don't have a clue what could be causing the bot to hit walls that are brighter, or how that ties in to the wall data that is read.

ghost commented 10 years ago

It seems I was mistaken, the walls are only brighter because the player is in the lane.

ghost commented 10 years ago

Just wondering if you were still around. I left the project alone for a while, but I'm wanting to finish it now and I'm still having the same problem from the comment right after your last one.

zku commented 10 years ago

You might have to take the slot modulo the number of slots; not sure why you are getting a number 22 there sometimes.

I can't really help you with the bot logic, I've never finished this in my bot either. As long as you keep jumping to empty slots, this should work with instantaneous movements (even though that's cheating).

ghost commented 10 years ago

When I started having trouble I tried to recreate your logic using modulo and such as you have now suggested, but I kept having the same problem. It's strange, it will choose the correct lane 80% of the time, but will almost always choose incorrectly on a certain wall pattern.

I don't know if this matters or not, but when I recreated your logic I still didn't understand the use of modulo. Perhaps I didn't recreate it properly if I didn't understand why you used it? hmm, not sure.