rizinorg / rizin

UNIX-like reverse engineering framework and command-line toolset.
https://rizin.re
GNU Lesser General Public License v3.0
2.66k stars 357 forks source link

Arbitrarily high hit counts / high allocation counts hurts search performance, refactor search hits to use rz_vector? #1909

Open Andersama opened 2 years ago

Andersama commented 2 years ago

Is your feature request related to a problem? Please describe. May be more related to cutter's gui because it looks like the api has a maximum search limit / count and perhaps that'd solve the issue in commandline. However, without knowing that in advance you can hang the application by running a search which may have too many results. The issue stems from usage of a list structure which allocates for every saved result in rz_search_t.

The problem function in question would be rz_search_hit_new.

Edit: on second look it appears the api is being used with a callback which terminates early and skips the allocation I thought was the original issue. This might impact uses of this part of the api without a provided callback, they're likely slower than it should be. Whatever's causing the application hang eludes me.

Describe the solution you'd like Key part here would be to refactor the search functionality to use a vector like (or probably better rope like) structure. This should improve search timings considerably since time won't be wasted waiting on the system for memory. There appears to be a header for vectors in the library, it probably could be put to use here. A well designed rope structure would probably be better.

EG: instead of

typedef struct rz_search_t {
    int n_kws; // hit${n_kws}_${count}
    int mode;
    ut32 pattern_size;
    ut32 string_min; // max length of strings for RZ_SEARCH_STRING
    ut32 string_max; // min length of strings for RZ_SEARCH_STRING
    void *data; // data used by search algorithm
    void *user; // user data passed to callback
    RzSearchCallback callback;
    ut64 nhits;
    ut64 maxhits; // search.maxhits
    RzList *hits;
    int distance;
    int inverse;
    bool overlap; // whether two matches can overlap
    int contiguous;
    int align;
    int (*update)(struct rz_search_t *s, ut64 from, const ut8 *buf, int len);
    RzList *kws; // TODO: Use rz_search_kw_new ()
    RzIOBind iob;
    char bckwrds;
} RzSearch;

something like:

typedef struct rz_search_t {
    int n_kws; // hit${n_kws}_${count}
    int mode;
    ut32 pattern_size;
    ut32 string_min; // max length of strings for RZ_SEARCH_STRING
    ut32 string_max; // min length of strings for RZ_SEARCH_STRING
    void *data; // data used by search algorithm
    void *user; // user data passed to callback
    RzSearchCallback callback;
    ut64 nhits;
    ut64 maxhits; // search.maxhits
    RzVector *hits; // this probably doesn't need to be a pointer? also nhits would be hits->len
    int distance;
    int inverse;
    bool overlap; // whether two matches can overlap
    int contiguous;
    int align;
    int (*update)(struct rz_search_t *s, ut64 from, const ut8 *buf, int len);
    RzList *kws; // TODO: Use rz_search_kw_new ()
    RzIOBind iob;
    char bckwrds;
} RzSearch;

Since results are addresses there's no need to allocate for every 8 byte address, and since any information related to pointers is auxiliary to the pointer there's and pointers are by definition unique there's plenty of opportunity to use hashtables to grab any interesting auxiliary information about the pointer later. It seems like in addition to the matched address currently a pointer to the search used is saved. There's probably a good amount of memory to save by not storing that pointer for each hit.

Describe alternatives you've considered Another idea would be a custom allocator, seems like that would considerably more difficult, and I suspect a better data structure is the better approach.

Additional context A fairly easy way to test this is to do what I did by accident, which is to debug a program and search for 0 effectively everywhere.

It also looks as if the maxhits variable can be used to reserve space in rz_search_begin. This in theory should reduce allocations wherever this api is used.

ret2libc commented 2 years ago

Hi! Thanks for your report.

Could you be a bit clearer on what is the problem before discussing possible solutions? What are you doing? What is the version of Rizin you are using? Which OS? Which binary (if possible to share)?

From the additional context section I'm trying doing (Fedora 34, x86_64, rizin v0.3.0)

$ rizin -d /bin/ls
> /x 00
Searching 1 byte in [0x55f9c8333000-0x55f9c8347000]
hits: 15413
Do you want to print 15413 lines? (y/N)
[...]

And this happens very quicky. Did I understand it right? Are you doing anything different? Thanks again

Andersama commented 2 years ago

I built off of whatever's included by default for building cutter on it's default branch. I'm on windows x64 bit. From memory I ran the command to search for 0's by accident across all memory maps when I was attached to inscryption.exe, there's a demo on steam which I can only imagine would do the same thing. I don't have complete context for what rizin would've done in command line, I essentially froze cutter.

ret2libc commented 2 years ago

We would need few more details to start looking at this. I feel this is more of a "bug report" rather than a "feature request"

We need to understand, at least:

Without the above info there is not much we can do to help you, sorry.

Andersama commented 2 years ago

Sure ok, I guess you're asking to be more specific: Windows 10 Home: Version 10.0.19042 Build 19042 (x64) image image

Here's what I did, or I believe there might be an extra option when debugging a program for "All memory maps", if there's a command which goes byte by byte for 0s you might want to try that instead. image I wouldn't necessarily call it a bug, it was more or less just that the results took so long the application went unresponsive, and from memory although it's been a few weeks now it might be the work done to output to console for cutter that takes so long.

The binary I was analyzing was the inscryption game available on steam. There's been an update since, so I don't think I could get you an exact binary to look at, but I'm fairly confident the demo would do a similar thing if you really need it. I'd try making an executable which just loads a lot of 0's into memory and overall check and/or stress test the performance of the search widget when there's a lot of matches.