Closed marco6 closed 1 year ago
Thanks for reporting this issue. The underlying problem here is that TestProfile allocates too much (around 2GB), so that if the virtual address space has been artificially limited to 5GB, then there isn't enough left after the optimization to represent the actual memory allocated by the program. I've reported this at #455.
We can't change the public API of Int even if we wanted to, but fortunately it isn't necessary. The appropriate fix is to improve TestProfile, and perhaps also to add an option (e.g. a flag or env var) to allow users to disable the int optimization entirely.
being able to allocate 4GB of ram is not enough of a reason to believe that consuming 4GB of virtual memory is irrelevant. it should never happen that adding more than twice the memory causes an "Out Of Memory" for an otherwise working program.
To be clear, the optimization never attempts to allocate 4GB of RAM (though I think you're aware of that). Are virtual address spaces limited to 5GB actually a problem in practice? 4GB, yes (e.g. 386) ; 700MB, yes (e.g. Android). I'm curious to know how you encountered this problem in production.
The underlying problem here is that TestProfile allocates too much (around 2GB)
The underlying problem is that as soon as you have enough virtual space to "book" 4GB, then immediately the available address space drops, which is rather strange from a usage POV. This would be true for any usage of Starlark, not just TestProfile
. While 2 GB for a single Starlark execution seems a lot, this problem would arise in any project using Starlark as a scripting back-end as in that case there could be multiple (possibly a lot) running scripts at the same time, easily consuming gigabytes of memory.
We can't change the public API of Int
I understand, and I agree 100%, so, for me, we can close this PR. Maybe a issue is better suited for this kind of discussion?
To be clear, the optimization never attempts to allocate 4GB of RAM (though I think you're aware of that).
Yeah sure. I guess I don't know a better verb other than "allocate" to say "booked a range in the virtual address space", which is rather confusing in this situation.
Are virtual address spaces limited to 5GB actually a problem in practice?
Not in general, but there could be, as you saw with other platforms. I still think that this behavior should be tunable at compile-time or at run-time as it can create problems (it did!) and we don't really know under which condition this piece of code will run.
I think I could summarize the issue I'm facing with:
TestIntFallback
is right now broken on any platform that doesn't support that optimization (just run GOARCH=386 go test .
: it fails for no reason);big.Int
s), so we have 3 behaviors: the "best" one, the "generic" one (compile-time), and the worse one (run-time, when the allocation fails).[^1]: oh well as per this thread, it actually is under the control of the programmer, using and undocumented feature of cgo
only. On the other side, this other discussion seems to be different. Confusing eh? Still doesn't feel like the right thing to do.
The underlying problem is that as soon as you have enough virtual space to "book" 4GB, then immediately the available address space drops, which is rather strange from a usage POV.
I agree it may be surprising, but I wonder whether and how this situation actually occurs in practice. Generally, the limit on the address space is either too small for the optimization (e.g. 4GB or 700MB) or unimaginably vast (2^64 or so). I haven't seen much in the middle, and I can't yet think of any reason why someone would artificially limit their address space (as opposed to max heap size) to a value close to the amount of memory needed by the program. So I'd like to understand this before we try to make a fix.
I'd be happy to add an environment variable to disable the optimization, but of course you'd still have to hit the problem at least once to know to set the var. But that seems fine.
This would be true for any usage of Starlark, not just TestProfile. While 2 GB for a single Starlark execution seems a lot, this problem would arise in any project using Starlark as a scripting back-end as in that case there could be multiple (possibly a lot) running scripts at the same time, easily consuming gigabytes of memory.
If each script or interpreter is running in its own process, then there's no problem, as each process has an independent ulimit on its address space.
I understand, and I agree 100%, so, for me, we can close this PR. Maybe a issue is better suited for this kind of discussion?
An issue is fine, but discussing it here seems fine too.
TestIntFallback is right now broken on any platform that doesn't support that optimization (just run GOARCH=386 go test .: it fails for no reason); on a supported platform, as soon you have, during startup, enough address space to book 4GB, your available address space will drop.
I haven't tested it on 386 recently, but we can certainly tweak (or disable) the tests to make it more portable or less fragile.
Note that this depends on the package initialization order
I'm not worried about initialization order: as you say, it's a topological order, and it is well documented by the spec as a predictable and deterministic function of a given source tree, so I don't think it's an interesting source of unpleasant surprises in practice.
In this PR I propose a new generic representation for Int(s) which would work for all non-posix non-x64 targets.
Moreover, I propose to add a build tag to completely remove the integer optimizations even for targets like linux x64 since being able to allocate 4GB of ram is not enough of a reason to believe that consuming 4GB of virtual memory is irrelevant.
For example:
IMHO, it should never happen that adding more than twice the memory causes an "Out Of Memory" for an otherwise working program.
Benchmarking
Here follows some benchmarking, performed on an Ubuntu 22.10, with an Intel Core i5.
x86
Current master:
This PR:
Both faster and consumes less memory.
x64
With the optimization on (no memory limits)
Current master:
This PR:
The difference in the metrics seems varying from run to run, the allocation pattern is the same.
With memory limits (ulimit -v 2000000)
Current master:
This PR:
Both faster and consumes less memory. Small ints suffer more than big ones as it must allocate values above 255.