launchdarkly / go-sdk-common

Basic types for LaunchDarkly Go SDK components
Other
4 stars 3 forks source link

feat: reduce memory allocation for common slice and map types #30

Closed dyolcekaj closed 1 month ago

dyolcekaj commented 2 months ago

Requirements

Describe the solution you've provided

Reduce memory allocations for some collection types by avoiding the slow and expensive JSON marshal->unmarshal path. Extend the list of type assertions to slices and maps of some primitive types.

Additional context

While memory profiling a production application we noticed that 50+% of our memory allocations were happening during ldvalue.CopyArbitraryValue. We use feature flags heavily, and for any HTTP request it is typical to evaluate 2-8 feature flags. We found that the reason for this high memory usage was due to a string slice with several hundred items being passed through ldvalue.CopyArbitraryValue on most ldcontext.Builder instantiations. This was occurring in a general use function like

func IsEnabled(ctx context.Context, flag string, attributes map[string]any) bool {
  client := ctx.Value("ld-client).(*ldclient.LDClient)

  builder := ldcontextBuilderFromCtx(ctx) // get some HTTP request type info, user info, etc. 

  // note ldvalue.CopyArbitraryValueMap has the same issue 
  for key, value := range attributes {
    builder.SetValue(key, ldvalue.CopyArbitraryValue(value))
  }

  // ignore errors for ex. 
  enabled, _ := client.BoolVariation(flag, builder.Build(), false)

  return enabled 
}

If any of the map[string]any values are themselves a []string or similar we hit the FromJSONMarshal slow path. To partially solve the problem we have written a wrapper around this SDK with more or less the same code as I am submitting here. The reduction in memory allocation is especially noticeable when copying []string, but in all cases there is a fairly significant improvement in latency and total memory usage.

Although this is verbose, alternatives that use generic functions like

func copyArbitraryType[T comparable](data []T) Value { ... }

don't save on allocations, and using reflection to determine the slice or map element types does have some benefit especially for slices but still has a high number of allocations for large maps. You can see an implementation of this change using reflection here

Here are benchmarks from my machine showing the improvement for these specific use cases. Command for all is go test -benchmem '-run=^$$' -bench="CollectionCopy*" ./ldvalue

Before:

BenchmarkCollectionCopyMapStringSmall-16                  197137          5597 ns/op        4730 B/op         47 allocs/op
BenchmarkCollectionCopyMapStringLarge-16                    1776        632766 ns/op      676664 B/op       4043 allocs/op
BenchmarkCollectionCopySliceStringSmall-16                415045          2731 ns/op        3931 B/op         19 allocs/op
BenchmarkCollectionCopySliceStringMedium-16                57158         20780 ns/op       35357 B/op        112 allocs/op
BenchmarkCollectionCopySliceStringLarge-16                  6002        186840 ns/op      267965 B/op       1016 allocs/op
BenchmarkCollectionCopySliceIntSmall-16                   428552          2633 ns/op        3835 B/op          9 allocs/op
BenchmarkCollectionCopySliceIntMedium-16                   58148         20332 ns/op       34397 B/op         12 allocs/op
BenchmarkCollectionCopySliceIntLarge-16                     6182        180290 ns/op      260010 B/op         15 allocs/op

After:

BenchmarkCollectionCopyMapStringSmall-16         1568584           769.4 ns/op      2117 B/op          2 allocs/op
BenchmarkCollectionCopyMapStringLarge-16           15642         76328 ns/op      254033 B/op          3 allocs/op
BenchmarkCollectionCopySliceStringSmall-16       5143736           226.0 ns/op      1048 B/op          2 allocs/op
BenchmarkCollectionCopySliceStringMedium-16       825129          1439 ns/op        9752 B/op          2 allocs/op
BenchmarkCollectionCopySliceStringLarge-16         78724         15250 ns/op       98328 B/op          2 allocs/op
BenchmarkCollectionCopySliceIntSmall-16          5327779           224.1 ns/op      1048 B/op          2 allocs/op
BenchmarkCollectionCopySliceIntMedium-16          832033          1419 ns/op        9752 B/op          2 allocs/op
BenchmarkCollectionCopySliceIntLarge-16            86389         13898 ns/op       98328 B/op          2 allocs/op

Reflection based benchmark

BenchmarkReflectCopyMapStringSmall-16                 516852          2229 ns/op        2678 B/op         23 allocs/op
BenchmarkReflectCopyMapStringLarge-16                   5341        197526 ns/op      310622 B/op       2004 allocs/op
BenchmarkReflectCopySliceStringSmall-16              2635124           447.3 ns/op      1048 B/op          2 allocs/op
BenchmarkReflectCopySliceStringMedium-16              346424          3233 ns/op        9752 B/op          2 allocs/op
BenchmarkReflectCopySliceStringLarge-16                38649         31042 ns/op       98329 B/op          2 allocs/op
BenchmarkReflectCopySliceIntSmall-16                 2634298           448.7 ns/op      1048 B/op          2 allocs/op
BenchmarkReflectCopySliceIntMedium-16                 340862          3293 ns/op        9752 B/op          2 allocs/op
BenchmarkReflectCopySliceIntLarge-16                   39921         30468 ns/op       98328 B/op          2 allocs/op

Benchstat

goos: darwin
goarch: amd64
pkg: github.com/launchdarkly/go-sdk-common/v3/ldvalue
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
                                │ original.txt  │              updated.txt              │
                                │    sec/op     │    sec/op     vs base                 │
CollectionCopyMapStringSmall-16       5.534µ ± ∞ ¹   2.103µ ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopyMapStringLarge-16       680.4µ ± ∞ ¹   193.5µ ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringSmall-16    2818.0n ± ∞ ¹   431.1n ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringMedium-16   20.874µ ± ∞ ¹   3.299µ ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringLarge-16    188.00µ ± ∞ ¹   30.97µ ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntSmall-16       2672.0n ± ∞ ¹   441.5n ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntMedium-16      20.542µ ± ∞ ¹   3.196µ ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntLarge-16       181.99µ ± ∞ ¹   30.73µ ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                            28.34µ         5.449µ        -80.77%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                │  original.txt  │              updated.txt               │
                                │      B/op      │     B/op       vs base                 │
CollectionCopyMapStringSmall-16       4.618Ki ± ∞ ¹   2.615Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopyMapStringLarge-16       661.2Ki ± ∞ ¹   303.3Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringSmall-16     3.839Ki ± ∞ ¹   1.023Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringMedium-16   34.526Ki ± ∞ ¹   9.523Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringLarge-16    262.07Ki ± ∞ ¹   96.02Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntSmall-16        3.745Ki ± ∞ ¹   1.023Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntMedium-16      33.590Ki ± ∞ ¹   9.523Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntLarge-16       253.86Ki ± ∞ ¹   96.02Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                            36.83Ki         12.74Ki        -65.41%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                │  original.txt  │              updated.txt              │
                                │   allocs/op    │  allocs/op    vs base                 │
CollectionCopyMapStringSmall-16         47.00 ± ∞ ¹    23.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopyMapStringLarge-16        4.043k ± ∞ ¹   2.004k ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringSmall-16      19.000 ± ∞ ¹    2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringMedium-16    112.000 ± ∞ ¹    2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceStringLarge-16    1016.000 ± ∞ ¹    2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntSmall-16          9.000 ± ∞ ¹    2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntMedium-16        12.000 ± ∞ ¹    2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
CollectionCopySliceIntLarge-16         15.000 ± ∞ ¹    2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                              71.27          6.438        -90.97%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

Regardless of your decision to accept this change, I appreciate your time.

cwaldren-ld commented 2 months ago

Hi @dyolcekaj, thank you for the detailed description and patch. It's appreciated.

At a first glance, the added verbosity seems to be worth the improvement. I'll give it a more thorough review as soon as I can.

Filed internally as 255748.

dyolcekaj commented 1 month ago

Hi @dyolcekaj, thank you for the detailed description and patch. It's appreciated.

At a first glance, the added verbosity seems to be worth the improvement. I'll give it a more thorough review as soon as I can.

Filed internally as 255748.

Hi @cwaldren-ld,

Thanks again for considering this change. Just following up on any internal discussions, curious to know if any decisions on accepting or rejecting this change have been made.

Thanks!

cwaldren-ld commented 1 month ago

Hi @dyolcekaj , this looks good to me. I need to make some (unrelated) changes in this repo first, but then I'll work on getting this merged in.

cwaldren-ld commented 1 month ago

(I rebased the branch, just FYI in case you pull.)

cwaldren-ld commented 1 month ago

Hi @dyolcekaj , thanks again for this. It went out as 3.2.0. This will be pulled into the Go SDK shortly (hopefully.)