Fixing bad allocation pattern in `float` builtin

Hi, I am submitting a patch that is the result of a memory problem I am investigating.

ATM I cannot share the code that is having the problem, but I could isolate it in a separate program:

package main

import (
    "flag"
    "fmt"
    "math/big"
    "os"
    "runtime"
    "runtime/pprof"
    "time"

    "go.starlark.net/starlark"
)

var asd starlark.Float

var (
    testcase = flag.Int("testcase", 0, "testcase")
)

func main() {
    flag.Parse()
    f, _ := os.Create("profile")

    result := make([]starlark.Value, 0, 1024*1024*10)

    defer func() {
        runtime.GC()
        pprof.Lookup("heap").WriteTo(f, 0)
        f.Close()
        runtime.KeepAlive(result)
    }()

    var integer starlark.Int

    fmt.Printf("Testcase %d\n", *testcase)

    switch *testcase {
    case 0:
        // Small integer
        integer = starlark.MakeInt64(1 << 16)
    case 1:
        // Big integer (>100 bits)
        integer = starlark.MakeBigInt(new(big.Int).MulRange(2, 30))
    default:
        // int64
        integer = starlark.MakeInt64(1 << 32)
    }

    for i := 0; i < 1024*1024*10; i++ {
        value, _ := starlark.Call(&starlark.Thread{}, starlark.Universe["float"], starlark.Tuple{integer}, nil)
        result = append(result, value)
    }
}

This code explores 3 use-cases that are representative of the float builtin function.

In the first case, it will transform a small integer (less than 32 bits) to float, in the second it will transform a big one (~100 bits) while the third will use a 64-bit number above 32 bits.

I am running and obtaining the result through go pprof toool like go run . -testcase <n> && go tool pprof -top profile && rm profile (from the cmd/starlark directory).

Below the results showing the problem.

Case 0:

Type: inuse_space
Time: Nov 23, 2022 at 2:01pm (CET)
Showing nodes accounting for 247.50MB, 99.40% of 249MB total
Dropped 18 nodes (cum <= 1.25MB)
      flat  flat%   sum%        cum   cum%
     160MB 64.26% 64.26%   247.50MB 99.40%  main.main
   87.50MB 35.14% 99.40%    87.50MB 35.14%  go.starlark.net/starlark.float
         0     0% 99.40%    87.50MB 35.14%  go.starlark.net/starlark.(*Builtin).CallInternal
         0     0% 99.40%    87.50MB 35.14%  go.starlark.net/starlark.Call
         0     0% 99.40%   247.50MB 99.40%  runtime.main

As expected, it is using some space in the main.main for the vector and around 88 MB for the floats.

Case 1:

Type: inuse_space
Time: Nov 23, 2022 at 2:02pm (CET)
Showing nodes accounting for 242.50MB, 99.38% of 244MB total
Dropped 14 nodes (cum <= 1.22MB)
      flat  flat%   sum%        cum   cum%
     160MB 65.57% 65.57%   242.50MB 99.38%  main.main
   82.50MB 33.81% 99.38%    82.50MB 33.81%  go.starlark.net/starlark.float
         0     0% 99.38%    82.50MB 33.81%  go.starlark.net/starlark.(*Builtin).CallInternal
         0     0% 99.38%    82.50MB 33.81%  go.starlark.net/starlark.Call
         0     0% 99.38%   242.50MB 99.38%  runtime.main

As expected this is taking much more time to perform (from the big.Int) but at the end, the memory is pretty similar to the original one (difference can be accounted to my very rough way of performing measurements and varies between different runs).

Case 2:

Type: inuse_space
Time: Nov 23, 2022 at 2:02pm (CET)
Showing nodes accounting for 312.50MB, 99.36% of 314.50MB total
Dropped 16 nodes (cum <= 1.57MB)
      flat  flat%   sum%        cum   cum%
     160MB 50.87% 50.87%   312.50MB 99.36%  main.main
   83.50MB 26.55% 77.42%    83.50MB 26.55%  math/big.nat.make (inline) <---- What is this??
      69MB 21.94% 99.36%   152.50MB 48.49%  go.starlark.net/starlark.float
         0     0% 99.36%   152.50MB 48.49%  go.starlark.net/starlark.(*Builtin).CallInternal
         0     0% 99.36%   152.50MB 48.49%  go.starlark.net/starlark.Call
         0     0% 99.36%    83.50MB 26.55%  go.starlark.net/starlark.Int.Float
         0     0% 99.36%    83.50MB 26.55%  go.starlark.net/starlark.Int.finiteFloat
         0     0% 99.36%    83.50MB 26.55%  math/big.(*Float).Float64
         0     0% 99.36%    83.50MB 26.55%  math/big.(*Float).Set
         0     0% 99.36%    83.50MB 26.55%  math/big.nat.set (inline)
         0     0% 99.36%      313MB 99.52%  runtime.main

This case is the problem I'm facing: 80MB are allocated for the transient big.Float employed in the computation.

My explanation (hopefully right) of the problem is that float builtin is hitting a very specific worst-case scenario for Go's tiny allocator (which has been there for quite a while) in case the big.Int is allocating at most 8 bytes (so in Starlark we are talking of just the case the number is in the [2^32, 2^64) range).

In this very case, float function will allocate first 8 bytes in the tiny chunk (which is 16 bytes). Immediately after it will allocate other 8 bytes in the same chunk (not always as it depends on how much memory left is there in the chunk, but often enough to make it noticeable). The first allocation could be collectable, but since the tiny allocator collects memory only when all the objects in the chunk become unreachable, the entire chunk gets effectively linked to the lifetime of the starlark.Float (as starlark.Value), in practice doubling its "real" size (and keeping alive the nat part of the big.Int)

This code adds a "fast" path in case the big.Int is small enough to fit in 64 bits.

As a side note, I couldn't think of a representative test case/benchmark to show the problem in a normal go test, if you could provide some guidance on that it would be great.

google / starlark-go