Improved generic int impl

In this PR I propose a new generic representation for Int(s) which would work for all non-posix non-x64 targets.

Moreover, I propose to add a build tag to completely remove the integer optimizations even for targets like linux x64 since being able to allocate 4GB of ram is not enough of a reason to believe that consuming 4GB of virtual memory is irrelevant.

For example:

# Running the test suite on a linux x64 box, running in the folder <repo>/starlark

# 2 GB, it works
( ulimit -v $(( 2 * 1 << 20 )) && go test . -count=1 ) 

# 4 GB, it works
( ulimit -v $(( 4 * 1 << 20 )) && go test . -count=1 ) 

# 5 GB, it crashes with OOM
( ulimit -v $(( 5 * 1 << 20 )) && go test . -count=1 )

IMHO, it should never happen that adding more than twice the memory causes an "Out Of Memory" for an otherwise working program.

Benchmarking

Here follows some benchmarking, performed on an Ubuntu 22.10, with an Intel Core i5.

x86

Current master:

goos: linux
goarch: 386
pkg: go.starlark.net/starlark
cpu: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
BenchmarkStarlark/bench_bigint-4                1719        914096 ns/op       96120 B/op       7008 allocs/op
BenchmarkStarlark/bench_gauss-4                   25      48124428 ns/op     4637752 B/op     316324 allocs/op
BenchmarkStarlark/bench_int-4                   2826        516878 ns/op       32040 B/op       2002 allocs/op
BenchmarkStarlark/bench_mix-4                   2570        466473 ns/op       53404 B/op       1961 allocs/op
PASS
ok      go.starlark.net/starlark    9.426s

This PR:

goos: linux
goarch: 386
pkg: go.starlark.net/starlark
cpu: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
BenchmarkStarlark/bench_bigint-4                1869        647316 ns/op       70031 B/op       5749 allocs/op
BenchmarkStarlark/bench_gauss-4                   27      43207239 ns/op     2951792 B/op     289580 allocs/op
BenchmarkStarlark/bench_int-4                   2424        506344 ns/op       11943 B/op       1490 allocs/op
BenchmarkStarlark/bench_mix-4                   3367        373223 ns/op       32215 B/op        636 allocs/op
PASS
ok      go.starlark.net/starlark    6.826s

Both faster and consumes less memory.

x64

With the optimization on (no memory limits)

Current master:

goos: linux
goarch: amd64
pkg: go.starlark.net/starlark
cpu: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
BenchmarkStarlark/bench_bigint-4                3208        436642 ns/op      128176 B/op       5006 allocs/op
BenchmarkStarlark/bench_gauss-4                   80      18628541 ns/op     3387445 B/op     132323 allocs/op
BenchmarkStarlark/bench_int-4                   9541        119273 ns/op          48 B/op          1 allocs/op
BenchmarkStarlark/bench_mix-4                   6494        167812 ns/op       63608 B/op        636 allocs/op
PASS
ok      go.starlark.net/starlark    7.744s

This PR:

goos: linux
goarch: amd64
pkg: go.starlark.net/starlark
cpu: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
BenchmarkStarlark/bench_bigint-4                2724        422222 ns/op      128176 B/op       5006 allocs/op
BenchmarkStarlark/bench_gauss-4                   75      19292476 ns/op     3387464 B/op     132323 allocs/op
BenchmarkStarlark/bench_int-4                   9775        130073 ns/op          48 B/op          1 allocs/op
BenchmarkStarlark/bench_mix-4                   6882        179801 ns/op       63608 B/op        636 allocs/op
PASS
ok      go.starlark.net/starlark    6.337s

The difference in the metrics seems varying from run to run, the allocation pattern is the same.

With memory limits (ulimit -v 2000000)

Current master:

2023/01/20 16:53:56 Starlark failed to allocate 4GB address space: cannot allocate memory. Integer performance may suffer.
goos: linux
goarch: amd64
pkg: go.starlark.net/starlark
cpu: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
BenchmarkStarlark/bench_bigint-4                1654        640482 ns/op      168210 B/op       7007 allocs/op
BenchmarkStarlark/bench_gauss-4                   36      40755274 ns/op     9688947 B/op     447395 allocs/op
BenchmarkStarlark/bench_int-4                   3549        355176 ns/op       80080 B/op       4002 allocs/op
BenchmarkStarlark/bench_mix-4                   3476        387892 ns/op      116208 B/op       3236 allocs/op
PASS
ok      go.starlark.net/starlark    7.836s

This PR:

2023/01/20 16:54:14 Starlark failed to allocate 4GB address space: cannot allocate memory. Integer performance may suffer.
goos: linux
goarch: amd64
pkg: go.starlark.net/starlark
cpu: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
BenchmarkStarlark/bench_bigint-4                2896        401010 ns/op      134104 B/op       5749 allocs/op
BenchmarkStarlark/bench_gauss-4                   52      21606800 ns/op     4645504 B/op     289580 allocs/op
BenchmarkStarlark/bench_int-4                   7246        160877 ns/op       11967 B/op       1490 allocs/op
BenchmarkStarlark/bench_mix-4                   6639        213752 ns/op       63616 B/op        636 allocs/op
PASS
ok      go.starlark.net/starlark    7.837s

Both faster and consumes less memory. Small ints suffer more than big ones as it must allocate values above 255.

Thanks for reporting this issue. The underlying problem here is that TestProfile allocates too much (around 2GB), so that if the virtual address space has been artificially limited to 5GB, then there isn't enough left after the optimization to represent the actual memory allocated by the program. I've reported this at #455.

We can't change the public API of Int even if we wanted to, but fortunately it isn't necessary. The appropriate fix is to improve TestProfile, and perhaps also to add an option (e.g. a flag or env var) to allow users to disable the int optimization entirely.

being able to allocate 4GB of ram is not enough of a reason to believe that consuming 4GB of virtual memory is irrelevant. it should never happen that adding more than twice the memory causes an "Out Of Memory" for an otherwise working program.

To be clear, the optimization never attempts to allocate 4GB of RAM (though I think you're aware of that). Are virtual address spaces limited to 5GB actually a problem in practice? 4GB, yes (e.g. 386) ; 700MB, yes (e.g. Android). I'm curious to know how you encountered this problem in production.

The underlying problem here is that TestProfile allocates too much (around 2GB)

The underlying problem is that as soon as you have enough virtual space to "book" 4GB, then immediately the available address space drops, which is rather strange from a usage POV. This would be true for any usage of Starlark, not just TestProfile. While 2 GB for a single Starlark execution seems a lot, this problem would arise in any project using Starlark as a scripting back-end as in that case there could be multiple (possibly a lot) running scripts at the same time, easily consuming gigabytes of memory.

We can't change the public API of Int

I understand, and I agree 100%, so, for me, we can close this PR. Maybe a issue is better suited for this kind of discussion?

To be clear, the optimization never attempts to allocate 4GB of RAM (though I think you're aware of that).

Yeah sure. I guess I don't know a better verb other than "allocate" to say "booked a range in the virtual address space", which is rather confusing in this situation.

Are virtual address spaces limited to 5GB actually a problem in practice?

Not in general, but there could be, as you saw with other platforms. I still think that this behavior should be tunable at compile-time or at run-time as it can create problems (it did!) and we don't really know under which condition this piece of code will run.

I think I could summarize the issue I'm facing with:

TestIntFallback is right now broken on any platform that doesn't support that optimization (just run GOARCH=386 go test .: it fails for no reason);
on a supported platform, as soon you have, during startup, enough address space to book 4GB, your available address space will drop. Note that this depends on the package initialization order (which depends on the other modules of your project and is not under the control of the programmer[^1], you only get a "loose" (topological) ordering, if I read the specs right);
if the "allocation" (or mapping? or booking? whatever...) fails, the implementation doesn't switch back to the one you would get on an x32 one, but it will use an even worse one (all big.Ints), so we have 3 behaviors: the "best" one, the "generic" one (compile-time), and the worse one (run-time, when the allocation fails).

[^1]: oh well as per this thread, it actually is under the control of the programmer, using and undocumented feature of cgo only. On the other side, this other discussion seems to be different. Confusing eh? Still doesn't feel like the right thing to do.

The underlying problem is that as soon as you have enough virtual space to "book" 4GB, then immediately the available address space drops, which is rather strange from a usage POV.

I agree it may be surprising, but I wonder whether and how this situation actually occurs in practice. Generally, the limit on the address space is either too small for the optimization (e.g. 4GB or 700MB) or unimaginably vast (2^64 or so). I haven't seen much in the middle, and I can't yet think of any reason why someone would artificially limit their address space (as opposed to max heap size) to a value close to the amount of memory needed by the program. So I'd like to understand this before we try to make a fix.

I'd be happy to add an environment variable to disable the optimization, but of course you'd still have to hit the problem at least once to know to set the var. But that seems fine.

This would be true for any usage of Starlark, not just TestProfile. While 2 GB for a single Starlark execution seems a lot, this problem would arise in any project using Starlark as a scripting back-end as in that case there could be multiple (possibly a lot) running scripts at the same time, easily consuming gigabytes of memory.

If each script or interpreter is running in its own process, then there's no problem, as each process has an independent ulimit on its address space.

I understand, and I agree 100%, so, for me, we can close this PR. Maybe a issue is better suited for this kind of discussion?

An issue is fine, but discussing it here seems fine too.

TestIntFallback is right now broken on any platform that doesn't support that optimization (just run GOARCH=386 go test .: it fails for no reason); on a supported platform, as soon you have, during startup, enough address space to book 4GB, your available address space will drop.

I haven't tested it on 386 recently, but we can certainly tweak (or disable) the tests to make it more portable or less fragile.

Note that this depends on the package initialization order

I'm not worried about initialization order: as you say, it's a topological order, and it is well documented by the spec as a predictable and deterministic function of a given source tree, so I don't think it's an interesting source of unpleasant surprises in practice.

google / starlark-go