Open savetheclocktower opened 5 days ago
Delay the unloading of the current page until the
stop
method resolves; I've hacked together a proof of concept and it actually works, but it's a bit impractical in the general case.
To elaborate on this:
I had a theory that nsfw
wouldn't crash the renderer if it could guarantee disposal before the page unloaded. Since calling stop
on a watcher is an async operation, this isn't elegant, since the whole page-unload mechanism (with the beforeunload
event) predates promises and async in general.
Still, I tried this:
function unloadDeferralListener(event) {
event.preventDefault();
console.log('Waiting for unload…');
NativeWatcher.waitForStop().then(() => {
console.log('…done.');
window.removeEventListener('beforeunload', unloadDeferralListener);
location.reload();
});
}
class NativeWatcher {
static promises= new Set();
static initialized = false;
static async waitForStop() {
let { promises } = NativeWatcher;
if (promises.size === 0) {
return Promise.resolve();
}
console.log('Waiting for', promises.size, 'promises');
await Promise.allSettled(promises);
}
static initialize() {
if (this.initialized) return;
addEventListener('beforeunload', unloadDeferralListener);
this.initialized = true;
}
// ...
}
The static NativeWatcher.initialize
method is needed because addEventListener
isn't yet available upon initial evaluation of this file.
In the stop
method, I proceed as before, except that I add any pending stop operation to NativeWatcher.promises
just after creation, and remove it from the same set when the promise is fulfilled.
This is hacky, but it works. It's hacky because it'd probably fall apart as a strategy if more than one thing on the page assumed it could use this technique.
But it's good as a best-alternative. If we fail to solve this in a more elegant fashion, I could zhuzh this up by creating a more formal internal API around it so that it could safely be used by more than one thing.
Good news! My horrible idea from the previous comment won't be needed.
The specific crash I've been seeing is the fault of our path-watching strategy:
/foo/bar
and /foo/baz
./foo/bar
directly, but once we see that /foo/baz
wants to be watched as well, we combine them into one watcher that watches /foo
./foo/bar
watcher were stopped, we'd react by creating a new watcher for /foo/baz
specifically. This is one of several ways in which disposing of one watcher can trigger the creation of another. It took me a while to process the fact that the crash seemed to be happening during a call to watcher.start
, not watcher.stop
. I figured that was because the new page was initializing and running into some resources that were still locked or unfinalized or something from the previous run, but eventually I realized that those calls to watcher.start
were happening on the original page while it was being unloaded.
Why? Because we were disposing of watchers en masse during page unload. None of those watchers realized that the point was to disable all watchers, so they tried to start new ones in a terminating renderer environment.
This is fixable simply by setting a flag at the top of PathWatcher::dispose
that watchers can check to know whether they should reattach. If the whole PathWatcher
is being disposed, there's no point in creating new watchers. A more thorough fix would envision that watchers tend to be disposed in batches, meaning we should consider not creating a new watcher before we know if it'll even be needed — but that's a lot harder.
So this is the right answer for now; setting that flag seems to consistently prevent renderer process crashes. Still, it's not great to know that there's a way for the renderer process to crash, even if we're using entirely thread-safe native libraries!
In fact, it happens because Node has no other way of signaling that something's gone wrong at that point:
By default, throwing an exception on a terminating environment (eg. worker threads) will cause a fatal exception, terminating the Node process. This is to provide feedback to the user of the runtime error, as it is impossible to pass the error to JavaScript when the environment is terminating. The
NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS
preprocessor directive can be defined to bypass this behavior, such that the Node process will not terminate.
The ideal situation would be for nsfw
to be able to recognize when it's being asked to watch files in a terminating environment and silently fail. The next-to-ideal situation would be for nsfw
to define the NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS
directive in its binding.gyp
.
I've filed an issue about this with nsfw
, but if they don't decide to do either of these things, then one option for us would be to maintain a “fork“ of nsfw
whose only divergence is defining that directive in binding.gyp
. In my tests, that directive was enough to prevent the crashes even with no other changes.
It's not a high-priority thing, but it might be something we start thinking about if we find that a community package is using atom.watchPath
sloppily.
Good news! nsfw
accepted my PR adding NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS
. I'll keep this open until they put out a new release with this change.
Thanks in advance for your bug report!
What happened?
As far as I can tell, nsfw is the most widely used library for listening to filesystem events. Atom has implemented a couple of their own , and though it's tempting to try to adapt one of those to the exact needs of modern Electron, it's not much fun.
The latest version of
nsfw
is almost perfect, but it still manages to crash Pulsar on window reload about half the time. After some research, this seems to happen because:NativeWatcher
insrc/path-watcher.js
on startupdispose
and callstop
on the innernsfw
watcher, which is an async operation that returns a promise; because of reasons explained below, this often starts a different watcherstart
method outlives the web page that initiated it; when it resolves, it does something that causes a crash, presumably because it's trying to access something that no longer existsI can think of a few ways to handle this:
nsfw
’s maintainer would accept a patch.stop
method resolves; I've hacked together a proof of concept and it actually works, but it's a bit impractical in the general case.AtomEnvironment
instance is being destroyed. (This is what we've implemented for now.)Pulsar version
pulsar-next
Which OS does this happen on?
🍎 macOS
OS details
Sonoma 14.7
Which CPU architecture are you running this on?
Apple M-series
What steps are needed to reproduce this?
On PulsarNext…
Additional Information:
No response