lpereira / lwan

Experimental, scalable, high performance HTTP server
https://lwan.ws
GNU General Public License v2.0
5.94k stars 548 forks source link

Enhance `value_lookup` bitwise binary search #367

Closed brianwitte closed 3 months ago

brianwitte commented 4 months ago

Improved index tracking logic and added explicit handling of zero cases with n | 1 to avoid undefined behavior in __builtin_clz* functions.

I ran into an issue here when trying out lua handlers and then I stepped through (and added debug logging) through the various query parameter handling functions and found that value_lookup was unable to retrieve values from they kv structure correctly.

🦉 lwan-mod-lua.c:227 lua_handle_request() Error from Lua script: 
[string "function handle_get_root(req)..."]:5: attempt to perform arithmetic on 
local 'param_a' (a string value)
...
🐰 lwan-lua.c:138 request_param_getter() Key string: a
🐰 lwan-request.c:506 parse_query_string() Parsing query string: a=2&b=3
🐰 lwan-request.c:432 parse_key_values() Starting to parse key-values from: a=2&b=3
🐰 lwan-request.c:471 parse_key_values() Parsed key-value pair: a = 2
🐰 lwan-request.c:471 parse_key_values() Parsed key-value pair: b = 3
🐰 lwan-request.c:511 parse_query_string() Parsed query parameters:
🐰 lwan-request.c:514 parse_query_string() Key: a, Value: 2
🐰 lwan-request.c:514 parse_query_string() Key: b, Value: 3
🐰 lwan-lua.c:148 request_param_getter() Value for key a is a
🐰 lwan-lua.c:138 request_param_getter() Key string: b
🐰 lwan-lua.c:148 request_param_getter() Value for key b is b

I refactored using the same guidance in orlp's article that was in value_lookup as a comment.

/*
 * Bitwise search for the key using a modified lower bound approach.
 * The loop uses a bitmask to converge on the index of the correct key.
 */
for (; bit != 0; bit >>= 1) {
    size_t i = b | bit;
    // Check boundary and compare
    if (i <= n && key_value_compare(&k, &base[i - 1]) >= 0)
        b = i; // Update index
}
// Final match check
if (b <= n && key_value_compare(&k, &base[b - 1]) == 0)
            return base[b - 1].value; // Return value

The primary difference in this refactoring lies in how the index is computed. The C++ template in the article uses std::bit_floor(n) to find the highest power of two less than or equal to n, which we mimic using __builtin_clz operations.

The calculation b | bit directly correlates with how the bitwise OR operation is used in the C++ template to conditionally update the index b.

Instead of iteratively adjusting the index as in the original, this approach directly influences b by setting bits when the condition key_value_compare(&k, &base[i - 1]) >= 0 holds true.

I tried my best to ensure that the semantics are preserved by maintaining the key-value comparison logic and the final check to confirm the key match before returning the value.

lpereira commented 4 months ago

Good catch! I'll have to take a closer look at this because this is really tricky code. We maybe need to add some SELF_TEST to this too to ensure it doesn't break in the future.

brianwitte commented 4 months ago

@lpereira -- sounds good, agreed.

I will to add some SELF_TEST for the greater good 😄

Will keep this up and add them here!

brianwitte commented 3 months ago

closing for now since we are going back to another approach for this lookup. we can revisit this later if it makes sense. thanks for looking into it! :sunglasses: