Reko is under the progress of obtaining the capability to search for a given pattern in O(log n) time. This is done using suffix arrays. These are somewhat expensive to compute -- about 1000 msec for a 500 kb binary -- but once they are available they never change. It makes sense to cache the suffix arrays so that repeated
analyses of the same binary may benefit.
When loading the MemoryAreas of a binary foo.exe, Reko should look for an existing suffix array from a previous run; a file called foo.exe.rekosuffix or similar. If it doesn't exist or if the binary has a "younger" modified date than the suffix file, the suffix file is created by iterating through all memory areas and creating a suffix array for each, then writing a file whose format is:
All integers stored in big-endian format
+-----------+
| magic num | uint32 Magic number 'r' 'e' 'k' 'o' 's' 'f' 'x' 0x1A
+-----------+
| count | uint32 Number of suffix arrays, following immediately
+-----------+
| linaddr 0 | uint64 Linear address of beginning of memory area
| offset 0 |------. uint32 file offset to suffix array
| length 0 | | unit32 suffix array
+-----------+ |
| linaddr 1 | |
| offset 1 |----. |
| length 1 | | |
------------+ | |
: : | |
+-----------+ | |
| linaddr n | | |
| offset n |--. | |
| length n | | | |
------------+ | | |
: : | | |
+-----------+ | | |
| suffix |<-+-+-' uint32[] Integers of suffix arrays stored in
| array 0 | | | big-endian format
+-----------+ | |
: : | |
+-----------+ | |
| suffix |<-+-'
| array 1 | |
+-----------+ |
: : |
+-----------+ |
| suffix |<-'
| array n |
+-----------+
Not bothering with 64-bit suffix arrays since I've yet to see an executable whose image size is > 2 GiB.
The suffix arrays for a program are useful at load time, when the image may be packed, and after loading, when the image is unpacked and relocated. Reko therefore needs two suffix files; foo.exe.raw-suffix and foo.exe.loaded-suffix.
Reko is under the progress of obtaining the capability to search for a given pattern in O(log n) time. This is done using suffix arrays. These are somewhat expensive to compute -- about 1000 msec for a 500 kb binary -- but once they are available they never change. It makes sense to cache the suffix arrays so that repeated analyses of the same binary may benefit.
When loading the
MemoryArea
s of a binaryfoo.exe
, Reko should look for an existing suffix array from a previous run; a file calledfoo.exe.rekosuffix
or similar. If it doesn't exist or if the binary has a "younger" modified date than the suffix file, the suffix file is created by iterating through all memory areas and creating a suffix array for each, then writing a file whose format is:Not bothering with 64-bit suffix arrays since I've yet to see an executable whose image size is > 2 GiB.
The suffix arrays for a program are useful at load time, when the image may be packed, and after loading, when the image is unpacked and relocated. Reko therefore needs two suffix files; foo.exe.raw-suffix and foo.exe.loaded-suffix.