Closed guest271314 closed 2 years ago
Are the child processes you're spawning exiting/closing?
No. There should only be 1 child process spawned which streams output until the host exists.
Minimal, complete, verifiable example attached "node-native-messaging-host-memory-usage.zip":
node-native-messaging-host-memory-usage.zip
Source code: https://gist.github.com/guest271314/42ef867085f06b7b66a4fe0c7f289b98
Steps to reproduce (OS *nix; Chromium or Chrome; tested on Chromium 105, dev channel):
~/.config/chromium/NativeMessagingHosts
node
is not in $PATH
asjust the shebang line to point to node
executableServiceWorker
Uint8Array
in a while
loop, spreads the TypedArray
to an Array
and writes the data to process.stdout.write()
) increase constantly.Expected result:
Creating a Uint8Array
and sending data using process.stdout.write()
in a while
loop to not increase RSS from 93MB to 575MB in under 30 seconds. Once the data is send to stdout Node.js should not keep the buffer in memory.
Actual result:
RSS increases from 93MB to 575MB in under 30 seconds. See attached WebM in .zip file that captures Task Manager window during usage of the Node.js Native Messaging host node-native-messaging-host-memory-usage.webm.zip .
Creating a Uint8Array and sending data using process.stdout.write() in a while loop to not increase RSS from 93MB to 575MB in under 30 seconds.
Node.js (well, V8) grows the JS heap when the program creates a lot of garbage (which your program seems to be doing.)
The --max_old_space_size=<mb>
limits the maximum size. The default is approx. 1.5 GiB. Try setting it to e.g. 128 or 256.
Another thing: you're pumping out data as fast as it comes in, without applying any backpressure. Try rewriting your script to not use the 'data'
event because that's basically a fire hose.
@bnoordhuis
I invested the better part of yesterday into researching this subject matter.
I usually save the leads and threads that lead me to potential solutions.
Unfortunately I lost the bookmarks I had while testing the code at the Gist https://gist.github.com/guest271314/42ef867085f06b7b66a4fe0c7f289b98 when the machine froze. I'm running the machine on RAM alone.
I observed CPU usage increased in proportion to RSS decreasing. When the machine froze on a Web page CPU was at 98%.
I was able to jot down the options passed to node
from the frozen screen.
The
--max_old_space_size=<mb>
limits the maximum size. The default is approx. 1.5 GiB. Try setting it to e.g. 128 or 256.
I did in fact locate the --max_old_space_size=<mb>
option during my research, however that option alone is not sufficient to cease exponential growth of VSZ and RSS during usage.
What I have so far is:
Options passed to node
executable
--max-old-space-size=5
--jitless
--expose-gc
--v8-pool-size=1
In the script, before setting process
handlers
process.env.UV_THREAD_POOL = 1
substituting let
for const
, and setting all variables to null
after usage, then calling global.gc()
.
The above approach, for the actual audio capture code - not the stress-test gist code, which again, froze the entire machine - appears to tentatively stabilizes RSS at ~42MB, VSZ at ~263.5MB, and CPU for the node
program at 33-42%, with overall CPU usage at 69-86% on AMD A8-7410 APU, CPU MHz 1145.980, 4 CPU cores.
I need to make sure that is viable for long-running processes, e.g., capturing 2 or more hours of audio (raw PCM). Those options wind up making node
exit within a few minutes. Setting max-old-space-size
to 6
stabilizes RSS at ~43MB. CPU usage rises to 80-100%. More testing is needed.
Another thing: you're pumping out data as fast as it comes in, without applying any backpressure. Try rewriting your script to not use the
'data'
event because that's basically a fire hose.
What do you suggest to use other than on('data', =>{})
. Cannot on('readable', =>{})
fire more than once?
Interestingly given your description of 'data'
event, the target bytes that I read with C++, C, and QuickJS, 1764, which is 441 4, to construct Float32Array
s from the Uint8Array
s passed to the browser, is not infrequently not reached by Node.js. The 4411 makes it easier to construct Uint16Array
which gets converted to floats - without storing overflow from incoming stream that are less than or greater than 441 * 4.
Line 16 in transferableStream.js logs length of (JSON) array from Native Messaging host
async function handleMessage(value, port) {
if (value.length < 1764) console.log(value.length);
try {
await writer.ready;
await writer.write(new Uint8Array(value));
++writes;
} catch (e) {
console.error(e.message);
}
return true;
}
See screenshot of Node.js host output to client:
An explicit flush()
would help. C++, C, Python, and even QuickJS and derivatives provide such a method for file descriptors/streams.
There really was no roadmap to combine all of those options.
I'll post what I remembered and re-wrote here now just in case the machine freezes again during further testing
#!/usr/bin/env -S /path/to/node --max-old-space-size=6 --jitless --expose-gc --v8-pool-size=1
// Node.js Native Messaging host
// https://github.com/simov/native-messaging/blob/master/protocol.js
// https://github.com/simov/native-messaging/blob/master/nodejs/example.js
// Might be good to use an explicit path to node on the shebang line
// in case it isn't in PATH when launched by Chrome
process.env.UV_THREAD_POOL = 1;
process.stdin.on('readable', () => {
let input = [];
let chunk;
while ((chunk = process.stdin.read())) {
input.push(chunk);
}
input = Buffer.concat(input);
let msgLen = input.readUInt32LE(0);
let dataLen = msgLen + 4;
if (input.length >= dataLen) {
let content = input.slice(4, dataLen);
handleMessage(JSON.parse(content.toString()));
input = chunk = msgLen = dataLen = null;
global.gc();
}
});
function sendMessage(msg) {
let buffer = Buffer.from(JSON.stringify(msg));
let header = Buffer.alloc(4);
header.writeUInt32LE(buffer.length, 0);
let data = Buffer.concat([header, buffer]);
process.stdout.write(data);
msg = buffer = header = data = null;
global.gc();
}
process.on('uncaughtException', (err) => {
sendMessage({ error: err.toString() });
});
function handleMessage(input) {
const [command, ...args] = input.split(' ');
let { spawn } = require('child_process');
let child = spawn(command, args);
child.stdout.on('data', (data) => {
sendMessage([...data]);
input = command = args = null;
global.gc();
});
}
Does writing synchronously to stdout with something like fs.writeSync(1, data)
instead of process.stdout.write(data)
change anything?
Doesn't fs.writeSync()
write to a file?
@mscdex FYI: See https://developer.chrome.com/docs/apps/nativeMessaging/#native-messaging-host-protocol
Native messaging protocol
Chrome starts each native messaging host in a separate process and communicates with it using standard input (stdin) and standard output (stdout). The same format is used to send messages in both directions: each message is serialized using JSON, UTF-8 encoded and is preceded with 32-bit message length in native byte order. The maximum size of a single message from the native messaging host is 1 MB, mainly to protect Chrome from misbehaving native applications. The maximum size of the message sent to the native messaging host is 4 GB.
The first argument to the native messaging host is the origin of the caller, usually chrome-extension://[ID of allowed extension]. This allows native messaging hosts to identify the source of the message when multiple extensions are specified in the allowed_origins key in the native messaging host manifest. Warning: In Windows, in Chrome 54 and earlier, the origin was passed as the second parameter instead of the first parameter.
When a messaging port is created using runtime.connectNative Chrome starts native messaging host process and keeps it running until the port is destroyed. On the other hand, when a message is sent using runtime.sendNativeMessage, without creating a messaging port, Chrome starts a new native messaging host process for each message. In that case the first message generated by the host process is handled as a response to the original request, i.e. Chrome will pass it to the response callback specified when runtime.sendNativeMessage is called. All other messages generated by the native messaging host in that case are ignored.
No file is being written to. Native Messaging provides a means to run local applications from the browser; and to communicate to and from the local host application to and from the browser. Technically we could write a local file within the application, not stream the raw data as JSON, and stream the file being written to the browser, in parallel, using Web API's, e.g., https://github.com/guest271314/captureSystemAudio#stream-file-being-written-at-local-filesystem-to-mediasource-capture-as-mediastream-record-with-mediarecorder-in-real-time. In this case we are streaming real-time audio (PCM), what you hear output to headphones or speakers, to, for example; record a live press conference or jam session, etc.; which we can then simultaneously record and stream to other peers using other means; e.g., WebRTC.
The reason why I wrote the code in https://github.com/guest271314/captureSystemAudio is
window.speechSynthesis.speak()
; nor does navigator.mediaDevices.getUserMedia()
or navigator.mediaDevices.getDisplayMedia()
because Chromium and Chrome (and Firefox) do not route audio output of speech synthesis through the Tab;The question is
gc()
why is memory usage increasing when I am streaming directly from stdout
to the Native Messaging client?The whole point of Streams API ans Streams Standard is to actually stream data, not store data in internal buffers - particular after the data has already been written.
It makes no sense for an internal buffer to hold copy of data that is already sent via streaming.
Does writing synchronously to stdout with something like
fs.writeSync(1, data)
instead ofprocess.stdout.write(data)
change anything?
This doesn't change anything re memory usage. t does send buffers greater than 1764 to client more than on('data', =>{})
function read() {
let chunk;
while (null !== (chunk = child.stdout.read())) {
sendMessage([...chunk]);
chunk = null;
global.gc();
}
}
child.stdout.on('readable', read)
input = command = args = null;
Are --max-old-space-size=mb
, --expose-gc
, --v8-pool-size=
(not entirely sure what the value is intended to be), and process.env.UV_THREAD
the best we have in Node.js to reduce the size of the memory footprint of the nightly executable?
Without any flags or using gc() why is memory usage increasing when I am streaming directly from stdout to the Native Messaging client?
I already answered that: because you're drinking from the fire hose (the 'data'
event). Try applying back-pressure.
Aside: it's UV_THREADPOOL_SIZE, not UV_THREAD_POOL, and you need to set it upfront. Setting it inside your script isn't guaranteed to do anything.
I don't see anything here that suggests a bug in node so I'll go ahead and convert this to a discussion
Version
v19.0.0-nightly20220630350a6a8d59
Platform
Linux 5.4.0-42-lowlatency x86_64 x86_64 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
Create a Node.js Native Messaging https://developer.chrome.com/docs/apps/nativeMessaging/ host.
On *nix with PulseAudio installed execute
parec -d @DEFAULT_MONITOR@
to capture system audio output and stream to browser.Observe VSZ and RSS continue to constantly grow during usage in Task Manager.
C++, C, Python, QuickJS version do increase RSS minimally during usage, not remotely cf. to Node.js version of the same algorithm. To compare simply adjust the path to executable in
capture_system_audio.json
in https://github.com/guest271314/captureSystemAudio/tree/master/native_messaging/capture_system_audio to the languages listed above.Source code https://github.com/guest271314/captureSystemAudio/blob/master/native_messaging/capture_system_audio/capture_system_audio_node.js
How often does it reproduce? Is there a required condition?
Reproducible consistently.
What is the expected behavior?
VSZ and RSS to not increase by upwards of 10+MB during usage.
What do you see instead?
VSZ and RSS increasing constantly during usage.
Video showing the constant increase of RSS during usage. Cf. QuckJS version of the same algorithm in the video, where QuickJS does not drastically increase RSS (memory) consumption during usage https://thankful-butter-handball.glitch.me/.
Additional information
No response