privacysandbox / protected-auction-key-value-service

Protected Auction Key/Value Service
Apache License 2.0
54 stars 20 forks source link

Inline large data structures for caching #62

Open fhoering opened 4 months ago

fhoering commented 4 months ago

Caching is an important mechanism for improving performance. I tried to inline a large datastructure to be able to reuse it in a readonly mode without querying the KV storage again. But using this the performance is getting very bad.

Can you have a look where the downlift is coming from ? Can it come from the large file sizes or from reinitializing the array all the time ?

const space = 10000000
const w = [0.004996137771039466,0.7571896910549433,0.7937806398622418,0.9647932057841067,0.0022866805189704076,0.34038776806035753,0.4547363853940317,0.5426783773175982, .. ]; //length of space

function HandleRequest(executionMetadata, metadataKeys, signal) {
        let result = 0;
        for (let i = 0; i < 100; i++) {
            const index = Math.floor(Math.random() * space);
            result += w[index];
        }
      return result
    }
};
lx3-g commented 4 months ago

Hi Fabian,

To be explicit, the w will be reinitialized per request. No state is allowed to be shared between requests.

What would be the baseline you're comparing this too? Are you saying that for this line: result += w[index]; if you query the kv cache with getValues it is faster? Or mb if you query outside of the loop once?

Additionally, by large file sizes -- do you mean that by inlining the data structure here -- const w = ... it makes the UDF file large and you suspect that it impacts the perf?

Alexander

fhoering commented 4 months ago

I updated the initial example because I actually tried with floats. When you try this the array w is around 300 MB big. So in a normal JS environment querying this with getValues once or 10 mio times for each operation can never be faster than the inlined code.

I know the storage is not optimal but I'm rather interested in why this is not working. But I guess you already answered. If w is reinitialized per request this mechanism will obviously not work.

In theory I really see no privacy issue to do something like this, as Javascript supports immutable arrays and file operations also. So one could load this big array once and then reuse it across all requests until reloaded.

bjschnei commented 3 months ago

Hi Fabian,

We've added a benchmark to measure performance of globals and initialized them in 2 different ways. 1) module scoped 2) inlined

Though inlined initialization does impact code load time, it seems that execution time impact is limited. Does this match your findings?

https://github.com/privacysandbox/data-plane-shared-libraries/commit/261bb3f111b62011c3350da0e66d2c630cf4413a

fhoering commented 3 months ago

Thanks @bjschnei for the benchmarks.

Those tests are encouraging in particular for the inlined integer arrays but yet they don't seem to match with our findings done with web load tests. Our tests have actually been done with a much larger larger array size of 16 mio and using float arrays instead of ints (with the methodology and setup described here)

I also tried to execute the int array benchmark with a constant float and with a 10 times bigger array size and I see a slowdown in performance similar to what is also visible for structures.

On global_inline_int_array_ changing val = str([4.5 for a in range(i)]) instead of val = str(list(range(i))):

BM_ExecuteGlobalInlineIntArray/4096               0.962 ms        0.042 ms        10000
BM_ExecuteGlobalInlineIntArray/32768               1.17 ms        0.041 ms        10000
BM_ExecuteGlobalInlineIntArray/262144              2.85 ms        0.050 ms         1000
BM_ExecuteGlobalInlineIntArray/1000000             6.08 ms        0.061 ms         1000
BM_ExecuteGlobalInlineStructureArray/4096          1.38 ms        0.046 ms        10000
BM_ExecuteGlobalInlineStructureArray/32768         5.45 ms        0.057 ms         1000
BM_ExecuteGlobalInlineStructureArray/262144        37.0 ms        0.068 ms          100
BM_ExecuteGlobalInlineStructureArray/1000000       77.7 ms        0.076 ms          100

(BM_ExecuteGlobalInlineStructureArray has not been unchanged)

It looks more visible when you increase the array size to 10 mio.

On global_inline_int_array_ changing val = str([4.5 for a in range(i * 10)]) instead of val = str(list(range(i))):

BM_ExecuteGlobalInlineIntArray/4096                1.15 ms        0.043 ms        10000
BM_ExecuteGlobalInlineIntArray/32768               2.58 ms        0.045 ms         1000
BM_ExecuteGlobalInlineIntArray/262144              11.3 ms        0.058 ms         1000
BM_ExecuteGlobalInlineIntArray/1000000             38.5 ms        0.064 ms          100
BM_ExecuteGlobalInlineStructureArray/4096          1.34 ms        0.046 ms        10000
BM_ExecuteGlobalInlineStructureArray/32768         3.13 ms        0.046 ms         1000
BM_ExecuteGlobalInlineStructureArray/262144        21.6 ms        0.059 ms         1000
BM_ExecuteGlobalInlineStructureArray/1000000       76.4 ms        0.068 ms          100

(BM_ExecuteGlobalInlineStructureArray has not been unchanged)

76 ms and 38ms looks clearly too high for one single call and the array size seems to have an impact on execution time which looks unexpected. The associated JS file sizes also still look reasonable for server side processing (15 MB for the struct and 40 MB for the 10x increased array) How do you explain the difference between integer arrays, float arrays and structures ?

bjschnei commented 3 months ago

Fabian,

I'm not sure exactly why V8 handles floats and objects differently. A workaround that looks like it might work for you is inlining ArrayBuffers. I've added another benchmark to illustrate inline floats in the 2 ways (directly and with an array buffer).

https://github.com/privacysandbox/data-plane-shared-libraries/commit/0a9080a090ce0148c79ee150d06e55aab5e9da23