oven-sh / bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
https://bun.sh
Other
72.86k stars 2.64k forks source link

os.cpus() is broken #9203

Open argosphil opened 6 months ago

argosphil commented 6 months ago

What version of Bun is running?

1.0.29+a146856d1

What platform is your computer?

Linux 6.1.0-18-arm64 aarch64 unknown

What steps can reproduce the bug?

console.log(os.cpus())

What is the expected behavior?

The expected behavior is that os.cpus() returns a normal JavaScript array containing (useless) information on the hardware CPUs at the time the function is called.

What do you see instead?

The actual behavior is that os.cpus() returns an array with getters/setters and no information at all when logged:

[
  {
    model: [Getter/Setter],
    speed: [Getter/Setter],
    times: [Getter/Setter],
    toJSON: [Function: toJSON],
  }, {
    model: [Getter/Setter],
    speed: [Getter/Setter],
    times: [Getter/Setter],
    toJSON: [Function: toJSON],
  }
]

Additional information

The optimization performed by lazyCpus() never worked, never will, and we shouldn't do it.

os.cpus() is a bad API. os.cpus().length is a very bad way of determining a number which is almost meaningless today.

It should be fairly obvious why this never worked and never will: the information needs to be obtained at the time the os.cpus() function is called, the object returned must be a plain array of plain objects or a Proxy, and it cannot be a Proxy because we don't allow those to override their inspect methods.

It would be nice to optimize os.cpus().length, but it requires escape analysis in the compiler or transpiler and a way to expose that to the Zig code, which is a lot of work. Essentially we'd have to realize that a value is only used for a single property and becomes unreachable afterwards, then call the zig code with a special mechanism informing us that only this single property is relevant. It's a hard optimization to perform, although one that may speed up other APIs. I'm not sure whether JSC supports it at all. It's certainly overkill for this problem.

argosphil commented 6 months ago

os.cpus() is a bad API because it's based on a circa 2000 concept of a CPU: a CPU is a physical chip which allows a single thread of execution and runs at a fixed clock frequency, which determines its speed of execution. Normal users have only one of them, but some expensive system have a small fixed number of identical CPUs.

Today, a CPU is (often) virtual, may offer more than one thread of execution (which interfere in nontrivial ways), runs at a variable clock frequency, becomes available or unavailable while running as virtualization or firmware steals it from the operating system, becomes slower or faster due to power management even when it is running at the same frequency, and there are several different CPU cores on the same system at the same time, designed for efficiency, performance, or latency. The number of CPUs can change at any time. Furthermore, how many of these CPUs should be used by a specific program isn't exposed by os.cpus() at all: maybe my process is limited to a single CPU core by cgroups, maybe it doesn't benefit from SMT and I shouldn't use it, maybe there are many available CPUs which I could use but I shouldn't for some other reason.

On Linux, the current consensus is that the only information you should ever use is the number returned by nproc, and that system administrators should replace that program by one that prints the actually desired number of execution threads. It is nothing to do with the actual number of physical CPUs on any physical machine, and on recent systems is much larger than the number of threads it is useful to spawn (on this laptop, nproc currently prints 20).

Sorry this got a bit long, but in summary, we can't fix bad JavaScript code using a bad API to make a bad guess at the number they actually care for, which is "how many workers should I spawn". That number is almost totally unrelated to the name "hardware concurrency", but there is an API to access it, and it's regrettably called navigator.hardwareConcurrency.

We should define os.cpus = Bun._Os().cpus.

Electroid commented 6 months ago

Is the only reason you're saying it's broken is because the console.log output is non-helpful? If so, we could fix that.

The reason we do this optimization is because it's used in a lot of places and indeed does improve startup time. We also haven't seen any reported issues where it being lazy has caused issues.

argosphil commented 6 months ago

Is the only reason you're saying it's broken is because the console.log output is non-helpful? If so, we could fix that.

No. Right now, it's severely broken: if you modify the array returned by os.cpus(), things will break badly.

That can be fixed. What cannot be fixed is that delazification means the data won't be generated at the right time. If I do:

let oldCpus = os.cpus();
<expensive code here>
let newCpus = os.cpus();
return newCpus[0].times.user - oldCpus[0].times.user;

that will actually return a NEGATIVE number, because newCpus is delazified before oldCpus is. (I see no other way cpus()[i].times could be used, tbh. You take two snapshots and calculate the difference.)

The reason we do this optimization is because it's used in a lot of places and indeed does improve startup time. We also haven't seen any reported issues where it being lazy has caused issues.

I understand that. It's an invalid optimization that happens to work in 90% of cases. A similar example that was just asked about on Discord is why "a".toString() isn't simplified to "a". The answer is that it doesn't work in all cases.

Note that I can't reproduce the alleged Linux issue. As the "current clock speed" of a CPU is useless information, we could always return 0 for it, as we do on my aarch64 VM, or return the information from /proc/cpuinfo without digging into /sys. Both would avoid the performance problem, if it even still exists.

I believe the "overkill" solution is actually worth looking into: if a zig function is called to generate an object which is immediately destructured/derefed so only some of its properties are relevant, we can optimize away the generation of other properties (assuming they happen without side effects). For example, if Array.map() is called and we immediately discard the result, we can fall back to Array.forEach().

Other languages do this, of course, but I haven't been able to find out how it would work with JSC.

Jarred-Sumner commented 6 months ago

Can you point to third-party library code which modifies the array returned by os.cpus()? The data in os.cpus is supposed to mostly never change between runs.

argosphil commented 6 months ago

Can you point to third-party library code which modifies the array returned by os.cpus()?

Apart from the Discord OP's code, there appear to be 7 instances that are easily searchable on GitHub:

https://github.com/search?q=cpus().shift()&type=code https://github.com/search?q=cpus().pop()&type=code

As for the "times" property being used in a way that would break, here's a random example:

https://github.com/diefarbe/node-lib/blob/d17d118320ec16e1f2a9983dd7ec4864d4e9b31b/POC/ts/examples/cpu-meter.ts#L51

That code stores the return value of os.cpus() at one point, then compares it to another call of os.cpus() which happened later. Since both are delazified at the same time, there will be no difference.

I'm not sure how often either of those things happen.

The data in os.cpus is supposed to mostly never change between runs.

The only part that doesn't change for me is the "model" property.

Note that this isn't about making future calls to os.cpus() return different values, just about modifying what's documented to be an array of objects to retrieve the individual objects.

The usage of os.cpus().length to determine how many threads to spawn is explicitly advised against in the node documentation, FWIW.

argosphil commented 6 months ago

Maybe a compromise here would be to make cpus()[i].speed a getter and eagerly populate the rest of the objects? That way, the information would be only slightly inaccurate (clock speed measured at the wrong time), but it wouldn't be as slow as digging through nprocs files in /sys. On recent Linux systems the clock speed is also exported via /proc/cpuinfo so we wouldn't have to use a getter there at all.