nodejs / performance

Node.js team focusing on performance
MIT License
371 stars 7 forks source link

Avoid passing `process.env` if not exist on spawnSync and execSync #164

Open anonrig opened 2 months ago

anonrig commented 2 months ago

This line (ref: https://github.com/nodejs/node/blob/main/lib/child_process.js#L653) passes the whole process.env javascript object even though it is not required. We should get the environment variable from C++ to avoid paying the serialization cost of the extremely large env variable.

cc @nodejs/child_process @nodejs/performance

Ethan-Arrowood commented 2 months ago

I did some further analysis on the highlighted code. I don't know if this will be as big of a performance fix. It will only matter when the user does not specify the env option. Otherwise, there is no avoiding the serialization overhead. We'd need to transfer the logic after that line as well in order to sanitize? the input.

Additionally, it seems like the resulting envPairs object is used and mutated throughout other lib level child_process code. This change may become very complex for little performance gain.

Ethan-Arrowood commented 2 months ago

Before committing to this contribution, I'd want to understand the performance penalty for processing the average process.env object. The process.env on my development machine is 3kB in size: (new TextEncoder().encode(JSON.stringify(process.env)).length)/1024

anonrig commented 2 months ago

@Ethan-Arrowood the easiest way to benchmark is to remove the default value to empty object and test it. Ideally this whole function needs to be moved to C++ to have the highest impact.

If you're interested in more, the another option is to handle "stdio: inherit" case and move that logic to Cpp since it's just an array of object with size 3.

mcollina commented 2 months ago

Something that should be preserved is the fact that folks alter process.env to pass env variables down (typically via 3rd party libraries).