Open fhoering opened 4 months ago
Hi Fabian,
To be explicit, the w
will be reinitialized per request. No state is allowed to be shared between requests.
What would be the baseline you're comparing this too?
Are you saying that for this line:
result += w[index];
if you query the kv cache with getValues
it is faster? Or mb if you query outside of the loop once?
Additionally, by large file sizes -- do you mean that by inlining the data structure here --
const w = ...
it makes the UDF file large and you suspect that it impacts the perf?
Alexander
I updated the initial example because I actually tried with floats. When you try this the array w
is around 300 MB big. So in a normal JS environment querying this with getValues
once or 10 mio times for each operation can never be faster than the inlined code.
I know the storage is not optimal but I'm rather interested in why this is not working. But I guess you already answered. If w
is reinitialized per request this mechanism will obviously not work.
In theory I really see no privacy issue to do something like this, as Javascript supports immutable arrays and file operations also. So one could load this big array once and then reuse it across all requests until reloaded.
Hi Fabian,
We've added a benchmark to measure performance of globals and initialized them in 2 different ways. 1) module scoped 2) inlined
Though inlined initialization does impact code load time, it seems that execution time impact is limited. Does this match your findings?
Thanks @bjschnei for the benchmarks.
Those tests are encouraging in particular for the inlined integer arrays but yet they don't seem to match with our findings done with web load tests. Our tests have actually been done with a much larger larger array size of 16 mio and using float arrays instead of ints (with the methodology and setup described here)
I also tried to execute the int array benchmark with a constant float and with a 10 times bigger array size and I see a slowdown in performance similar to what is also visible for structures.
On global_inline_int_array_
changing val = str([4.5 for a in range(i)])
instead of val = str(list(range(i)))
:
BM_ExecuteGlobalInlineIntArray/4096 0.962 ms 0.042 ms 10000
BM_ExecuteGlobalInlineIntArray/32768 1.17 ms 0.041 ms 10000
BM_ExecuteGlobalInlineIntArray/262144 2.85 ms 0.050 ms 1000
BM_ExecuteGlobalInlineIntArray/1000000 6.08 ms 0.061 ms 1000
BM_ExecuteGlobalInlineStructureArray/4096 1.38 ms 0.046 ms 10000
BM_ExecuteGlobalInlineStructureArray/32768 5.45 ms 0.057 ms 1000
BM_ExecuteGlobalInlineStructureArray/262144 37.0 ms 0.068 ms 100
BM_ExecuteGlobalInlineStructureArray/1000000 77.7 ms 0.076 ms 100
(BM_ExecuteGlobalInlineStructureArray has not been unchanged)
It looks more visible when you increase the array size to 10 mio.
On global_inline_int_array_
changing val = str([4.5 for a in range(i * 10)])
instead of val = str(list(range(i)))
:
BM_ExecuteGlobalInlineIntArray/4096 1.15 ms 0.043 ms 10000
BM_ExecuteGlobalInlineIntArray/32768 2.58 ms 0.045 ms 1000
BM_ExecuteGlobalInlineIntArray/262144 11.3 ms 0.058 ms 1000
BM_ExecuteGlobalInlineIntArray/1000000 38.5 ms 0.064 ms 100
BM_ExecuteGlobalInlineStructureArray/4096 1.34 ms 0.046 ms 10000
BM_ExecuteGlobalInlineStructureArray/32768 3.13 ms 0.046 ms 1000
BM_ExecuteGlobalInlineStructureArray/262144 21.6 ms 0.059 ms 1000
BM_ExecuteGlobalInlineStructureArray/1000000 76.4 ms 0.068 ms 100
(BM_ExecuteGlobalInlineStructureArray has not been unchanged)
76 ms and 38ms looks clearly too high for one single call and the array size seems to have an impact on execution time which looks unexpected. The associated JS file sizes also still look reasonable for server side processing (15 MB for the struct and 40 MB for the 10x increased array) How do you explain the difference between integer arrays, float arrays and structures ?
Fabian,
I'm not sure exactly why V8 handles floats and objects differently. A workaround that looks like it might work for you is inlining ArrayBuffer
s. I've added another benchmark to illustrate inline floats in the 2 ways (directly and with an array buffer).
Caching is an important mechanism for improving performance. I tried to inline a large datastructure to be able to reuse it in a readonly mode without querying the KV storage again. But using this the performance is getting very bad.
Can you have a look where the downlift is coming from ? Can it come from the large file sizes or from reinitializing the array all the time ?