privacysandbox / protected-auction-key-value-service

Protected Auction Key/Value Service
Apache License 2.0
51 stars 18 forks source link

Inline large data structures for caching #62

Open fhoering opened 1 week ago

fhoering commented 1 week ago

Caching is an important mechanism for improving performance. I tried to inline a large datastructure to be able to reuse it in a readonly mode without querying the KV storage again. But using this the performance is getting very bad.

Can you have a look where the downlift is coming from ? Can it come from the large file sizes or from reinitializing the array all the time ?

const space = 10000000
const w = [0.004996137771039466,0.7571896910549433,0.7937806398622418,0.9647932057841067,0.0022866805189704076,0.34038776806035753,0.4547363853940317,0.5426783773175982, .. ]; //length of space

function HandleRequest(executionMetadata, metadataKeys, signal) {
        let result = 0;
        for (let i = 0; i < 100; i++) {
            const index = Math.floor(Math.random() * space);
            result += w[index];
        }
      return result
    }
};
lx3-g commented 1 week ago

Hi Fabian,

To be explicit, the w will be reinitialized per request. No state is allowed to be shared between requests.

What would be the baseline you're comparing this too? Are you saying that for this line: result += w[index]; if you query the kv cache with getValues it is faster? Or mb if you query outside of the loop once?

Additionally, by large file sizes -- do you mean that by inlining the data structure here -- const w = ... it makes the UDF file large and you suspect that it impacts the perf?

Alexander

fhoering commented 6 days ago

I updated the initial example because I actually tried with floats. When you try this the array w is around 300 MB big. So in a normal JS environment querying this with getValues once or 10 mio times for each operation can never be faster than the inlined code.

I know the storage is not optimal but I'm rather interested in why this is not working. But I guess you already answered. If w is reinitialized per request this mechanism will obviously not work.

In theory I really see no privacy issue to do something like this, as Javascript supports immutable arrays and file operations also. So one could load this big array once and then reuse it across all requests until reloaded.