Fix many bugs in cheat search

Memory allocation failures were causing cheat searches to miss millions of potential results. The cause was realloc() failures in CS_AddResult(). Allocations fail for very large blocks due to memory fragmentation. This is a 32-bit process, so it only has access to a 2 GB address space, and most of that is used for emulation, thread stacks, and a billion small allocations. The cheat search needs to allocate two 64 MB blocks (max) but the free space in the heap may not have two blocks large enough to satisfy it. When this happens, the current cheat results are thrown away and a new, smaller allocation replaces it. But the cheat search doesn't abort, it just continues on oblivious to the data loss.
Allocation failures were resolved by reducing the total memory required. The new result layout needs only: two 8 MB blocks, one 1 MB block, and a growable block (64 KB to 32 MB) for the address list. In the worst case, memory use is still almost half as small as it was before. And because it's split into multiple blocks, there is a better chance that they will all fit into the fragmented heap.
Better error handling when the dynamic block reallocation fails. I won't say it's perfect, since it can still have some leaks and bad user experience. But it's a start toward handling allocation failures gracefully.
Removed CS_InitResults(). This was an internal function, users are not supposed to need to even know about it. Now it's inlined with CS_ReserveSpace() which is required to be called before using most of the CS_ functions. (Except CS_InitSearch() which has nothing to do with the CS_RESULTS struct.)
Interacting with CS_RESULTS and CS_HITS has been completely refactored. CS_HITS has been split into multiple memory blocks as described above. The "growable address list" has been moved to CS_RESULTS, and CS_BITMAP replaces the rest of CS_HITS. The new CS_HIT is a single-element view of the old CS_HITS to avoid changing user code too much.
CS_AddResult, CS_AddHit, and CS_GetHit now all have two variants: one for bytes (8-bit searches) and one for words (16-bit searches). They each return BOOL, indicating errors when FALSE. And CS_GetHit(Byte|Word) takes an out-param as its first argument.
Fixed some memory leaks in WriteProject64ChtDev(): ChtDev->LastSearch->Results was never freed. Also free memory in early returns.
Fixed cheat search LiveUpdate thread so it won't deadlock when the emulator window/ROM browser is closed.
Fixed the prefix find in the results listbox. With the listbox focuses, pressing any hex character on the keyboard will initiate a find. The old algorithm attempted to do prefix matching, but only did masked matches. So most of the time the find function didn't work at all. The new way is much shorter and actually works.
Added braces on a lot of conditions to avoid goto-fail scenarios.

See also #19, which was caused by the same heap fragmentation issue. That PR only fixed one particular case.

TBD: Storing the list of addresses is still very wasteful. The list is necessary for O(1) time lookups when interacting with the result listbox. The listbox APIs use item indices for most operations, and the results listbox only contains addresses with "hits" in the cheat search. The naive solution is storing all addresses (32-bits each) in an array (max memory requirement is a 32 MB allocation block). This is the data structure used in both the original code and in this PR.

It is possible to reduce the memory requirement without degrading the lookup time terribly. First, observe that the address list is always sorted. Addresses are arranged in ascending order. Second, note that this sorted list contains a lot of redundancy; In the worst case with a fully populated list of 8-bit addresses, the first 65,536 addresses all share the same upper half; 0x0000. The next 65,536 addresses also share the same upper half; 0x0001. This pattern repeats to the end of the list, with upper half = 0x007f.

Remove this redundancy by storing multiple arrays, let's call them "buckets", of 16-bit values (i.e., only storing the lower half of each address). Each bucket will have exactly 65,536 entries, working out to 128 KB for each. And we only need 128 total buckets for a maximum of 16 MB required. That's a 50% reduction in the worst case. And even better, these smaller 128 KB blocks will be easier to allocate within the fragmented address space!

If it isn't clear by now, the index within the 128 buckets tells you the upper half of the address. Combine it with the lower half that is actually stored in the bucket, and you can recover the full address with half of the memory needed.

Lookups (find the Nth address in the list) can be made O(log(n)) with a prefix sum tree over the 128 buckets. Constant time (O(1)) lookups are not possible because each bucket is dynamically sized (even if its allocation is fixed, though they can be made much smaller). The bucket only stores addresses with search hits. The naive search solution is linear (O(n)), requiring visiting each bucket to count how many addresses it contains; in the worst case, it visits all 128 buckets.

The prefix sum tree instead sums the bucket counts in a tree that can be binary searched. For 128 buckets, the log-time search reduces to 7 bucket visits.

One example prefix sum tree data structure that can be used is called a Fenwick tree. Storage requirements for it are only the 128 ints making up the partial sums for the bucket item counts plus an extra int for the total sum.

The only downside to this approach is the additional code complexity. There isn't a lot of code to write, but it is easy to mess it up if you don't know why the data structure is needed (or how it works). It's only marginally slower than the naive constant-time array lookups. More than fast enough for the listbox drawing and find operations.

The upsides are: About half of the memory requirement in the worst case (unknown 8-bit search across the full 8 MB N64 RAM). Much smaller allocations are needed, which is easier for a fragmented heap to satisfy.

I am not planning to implement the prefix sum tree in this PR. But I've decided to write my thoughts here just in case the 32 MB allocations in the cheat search ever become problematic. We'll have something to look back on as a proposed solution.

pj64team / Project64-Legacy

Fix many bugs in cheat search #41