Closed achingbrain closed 4 months ago
It's worth noting that calling the exported cleanup
function after the test run makes the process exit successfully.
From what I can see, cleanup comes from RTCWrapper which closes all currently open RTCPeerConnection
s, then resets all of their callbacks which lets the process exit.
Give that you can't re-open a RTCPeerConnection that's been closed, it seems reasonable to not expect any events to fire after it's been closed, so resetting all the callbacks on close (instead of just destroy) would be ok?
Destroying or resetting callbacks will do the same.
But the point is that; if you call destroy or reset callbacks then you will not receive closed state change, which is not good. That is why I suggested calling cleanup when you don't need the lib.
Calling cleanup
isn't a solution for a server-style application that may service hundreds of thousands or millions of connections during it's lifetime.
If I run this script with the --trace-gc
node flag I can see that garbage collection occurs, but on my laptop it runs out of memory after a minute or so:
import { RTCPeerConnection } from 'node-datachannel/polyfill'
while (true) {
let conn = new RTCPeerConnection()
conn.close()
conn = null
}
The WebRTC spec says that calling close on a RTCPeerConnection tears down any associated resources, so it should become garbage collectable after that.
If it doesn't then this is a memory leak.
Hello,
If GC is running then it should call the destructor and reset all callbacks etc. But the absent point is we are not assigning the underlying object to null.
this.#peerConnection = null;
Could you please check and try this? https://github.com/murat-dogan/node-datachannel/pull/224
With #224 applied the above script still runs out of memory after a minute or so.
Interestingly I was trying to make it be kinder to the garbage collector by waiting for the connection to close before creating a new one with:
while (true) {
const conn = new RTCPeerConnection()
conn.close()
await new Promise(resolve => {
conn.addEventListener('connectionstatechange', () => {
if (conn.connectionState === 'closed') {
resolve()
}
})
})
}
with this it runs for a few iterations then the process exits with code 13 and no output:
% node index.js
% echo $?
13
If I only create a single RTCPeerConnection #224 does let the process exit, though it takes a long time:
import { RTCPeerConnection } from 'node-datachannel/polyfill'
const conn = new RTCPeerConnection()
conn.addEventListener('connectionstatechange', () => {
if (conn.connectionState === 'closed') {
console.timeEnd('close')
}
})
console.time('exit')
console.time('close')
conn.close()
process.addListener('exit', () => {
console.timeEnd('exit')
process.exit(0)
})
8 seconds to close!
% node index.js
close: 0.449ms
exit: 8.100s
Applying this diff to #224 makes the closing almost instant:
diff --git a/polyfill/RTCPeerConnection.js b/polyfill/RTCPeerConnection.js
index db3ae16..171c10e 100644
--- a/polyfill/RTCPeerConnection.js
+++ b/polyfill/RTCPeerConnection.js
@@ -61,6 +61,11 @@ export default class _RTCPeerConnection extends EventTarget {
// forward peerConnection events
this.#peerConnection.onStateChange(() => {
+ if (this.connectionState === 'closed') {
+ this.#peerConnection.destroy();
+ this.#peerConnection = null;
+ }
+
this.dispatchEvent(new Event('connectionstatechange'));
});
@@ -227,7 +232,6 @@ export default class _RTCPeerConnection extends EventTarget {
});
this.#peerConnection?.close();
- this.#peerConnection = null;
}
createAnswer() {
E.g. instead of null
ing this.#peerConnection
in the .close()
method, we wait for the connection state to change to 'closed'
, then reset the callbacks and remove the reference to the C++ object.
% node index.js
close: 0.436ms
exit: 5.802ms
5ms to close, much faster.
As an alternative to #224, this diff applied against master also solves the problem:
diff --git a/polyfill/RTCPeerConnection.js b/polyfill/RTCPeerConnection.js
index be9913b..08accf1 100644
--- a/polyfill/RTCPeerConnection.js
+++ b/polyfill/RTCPeerConnection.js
@@ -61,6 +61,10 @@ export default class _RTCPeerConnection extends EventTarget {
// forward peerConnection events
this.#peerConnection.onStateChange(() => {
+ if (this.connectionState === 'closed') {
+ this.#peerConnection.destroy();
+ }
+
this.dispatchEvent(new Event('connectionstatechange'));
});
Setting this.#peerConnection
to null
doesn't appear to be necessary, since the only reference to the C++ object is the internal field of the RTCPeerConnection
instance, once that instance has no references it can be collected along with the underlying C++ object.
Really I think the root cause is the use of Napi::Persistent
in ThreadSafeCallback
- it's purpose is to prevent the wrapped object from being garbage collected so we have to tell it when we are done.
@achingbrain Thanks for the detailed investigation.
To summarize things;
According to this info, we need to select an option.
I tried with Chrome. If you call the close function of a peer-connection, the peer does not receive any closed event. The other peer receives a disconnected event.
So it seems we can call the destroy function directly within the close function. What do you say?
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>RTCPeerConnection Example</title>
<script>
let peer1 = new RTCPeerConnection();
let peer2 = new RTCPeerConnection();
peer1.onicecandidate = ({candidate}) => candidate && peer2.addIceCandidate(candidate);
peer2.onicecandidate = ({candidate}) => candidate && peer1.addIceCandidate(candidate);
peer1.onconnectionstatechange = () => console.log('Peer1 State:', peer1.connectionState);
peer2.onconnectionstatechange = () => console.log('Peer2 State:', peer2.connectionState);
let channel = peer1.createDataChannel("mychannel");
channel.onopen = () => console.log('Data channel is open');
channel.onmessage = ({data}) => console.log('Received Message:', data);
peer2.ondatachannel = ({channel}) => {
channel.onmessage = ({data}) => console.log('Received Message:', data);
channel.onopen = () => {
console.log('Data channel is open');
channel.send('Hello from Peer2!');
};
};
peer1.createOffer().then(offer => {
return peer1.setLocalDescription(offer);
}).then(() => {
return peer2.setRemoteDescription(peer1.localDescription);
}).then(() => {
return peer2.createAnswer();
}).then(answer => {
return peer2.setLocalDescription(answer);
}).then(() => {
return peer1.setRemoteDescription(peer2.localDescription);
}).catch(console.error);
setTimeout(() => {
peer1.close();
}, 2000);
</script>
</head>
<body>
</body>
</html>
Indeed, with the Web API, RTCPeerConnection.close()
is synchronous and does not fire any event (see specification). I think it makes sense to align the behavior and call the destroy function directly in the close function.
I have just published v0.5.5 Could you please try it?
Sorry for the delay in replying, I've been AFK for a little while.
If I run:
import { RTCPeerConnection } from 'node-datachannel/polyfill'
const conn = new RTCPeerConnection()
conn.addEventListener('connectionstatechange', () => {
if (conn.connectionState === 'closed') {
console.timeEnd('close')
}
})
console.time('exit')
console.time('close')
conn.close()
process.addListener('exit', () => {
console.timeEnd('exit')
process.exit(0)
})
The process now exits almost immediately, so the initially reported issue is fixed, thanks for pushing that release. 🎉 🚀
However this script still causes the process to run out of memory after a minute or so:
import { RTCPeerConnection } from 'node-datachannel/polyfill'
while (true) {
const conn = new RTCPeerConnection()
conn.close()
}
...and this script causes the process to terminate unexpectedly with a non-zero error code (it should run forever due to no connectionstatechange
event being emitted):
import { RTCPeerConnection } from 'node-datachannel/polyfill'
while (true) {
const conn = new RTCPeerConnection()
conn.close()
await new Promise(resolve => {
conn.addEventListener('connectionstatechange', () => {
if (conn.connectionState === 'closed') {
resolve()
}
})
})
}
// test.js
import { RTCPeerConnection } from 'node-datachannel/polyfill'
let i=0;
while (i++<100*1000) {
const conn = new RTCPeerConnection()
conn.close()
}
Run Result
murat@murat-ThinkBook:~/js/node-datachannel$ free -h
total used free shared buff/cache available
Mem: 13Gi 3.7Gi 5.4Gi 90Mi 4.5Gi 9.5Gi
Swap: 14Gi 322Mi 14Gi
murat@murat-ThinkBook:~/js/node-datachannel$ node test.js
murat@murat-ThinkBook:~/js/node-datachannel$ free -h
total used free shared buff/cache available
Mem: 13Gi 3.6Gi 5.4Gi 92Mi 4.5Gi 9.5Gi
Swap: 14Gi 322Mi 14Gi
Same result as @achingbrain for while(true)
. But not sure if this is really a problem.
For me, it is related to GC
I have just released v0.9.0 https://github.com/murat-dogan/node-datachannel/releases/tag/v0.9.0
Should we close also this issue?
I am closing the issue. Please feel free to open if you need to.
This is a follow-on to #215, I think the problem still exists, even after
node-datachannel@0.5.4
.These references can keep threads alive:
The referenced lines in
RTCPeerConnection
are all callbacks registered with the C++ object, for example line 63:https://github.com/murat-dogan/node-datachannel/blob/d866015dea085164fa110e34d4e5d86a8cbaa050/polyfill/RTCPeerConnection.js#L63
If I change the
.close
method ofRTCPeerConnection
the problem goes away:I think this is because doDestroy calls doResetCallbacks which releases references to the JS callbacks.
To replicate, clone js-libp2p, apply this diff:
Then install & build and run the node tests:
You should see one test run, a brief pause then a list of all the handles keeping the process running:
Ignore the
FILEHANDLE
andKEYPAIRGENREQUEST
entries - it's theThreadSafeCallback callback
entries causing the problem.