Not possible to know when fs.watch has started on macOS

djcsdy commented 2 months ago

Version

v21.7.3

Platform

Darwin Mac-1713547127951.local 21.6.0 Darwin Kernel Version 21.6.0: Mon Feb 19 20:24:34 PST 2024; root:xnu-8020.240.18.707.4~1/RELEASE_X86_64 x86_64

Subsystem

fs

What steps will reproduce the bug?

By experimentation I have determined that, on Windows and Linux, fs.watch is watching the filesystem as soon as the function call returns.

Unfortunately, on macOS, fs.watch does not start watching the filesystem until some indeterminate amount of time after the function call returns. It is not possible to know when this is.

Consider the following code:

import {open} from "node:fs/promises";
import {watch} from "node:fs";
import {basename} from "node:path";

const path = "test";
const file = await open(path, "w");
await file.close();
const abortController = new AbortController();
const events = [];
const watcher = watch(
    path,
    {signal: abortController.signal},
    (event, path) => void events.push({event, path})
);
watcher.addListener("error", error => void events.push({event: "Error", error}));
// If you insert a delay here, then sometimes the test will pass on macOS.
// The longer the delay, the more likely the test is to pass.
const file2 = await open(path, "w");
await file2.close();
// You can insert a delay here if you want, but it doesn't seem to make any difference.
abortController.abort();
console.log(events);

How often does it reproduce? Is there a required condition?

For me, on macOS on GitHub Actions, the output of the above code is always

[]

and on Windows or Linux on GitHub Actions, the output is always

[ { event: 'change', path: 'test' } ]

However, I would not be surprised if macOS occasionally produces the same output as Windows/Linux by random chance.

What is the expected behavior? Why is that the expected behavior?

The expected output of the above script is:

[ { event: 'change', path: 'test' } ]

This is the actual output on Windows and Linux.

I would expect that most applications that use fs.watch do something like the following:

Call fs.watch to get a stream of events.
Read the initial state of the filesystem.
Re-read any files that are the subject of events emitted by fs.watch.

On Windows and Linux, which behave as I would expect, the application can always safely read the state of the filesystem (step 2) immediately after calling fs.watch. The application can be sure that any changes that occur after it reads the initial state of the filesystem will be included in the stream of events emitted by fs.watch. Therefore it is possible for the application to be reliably in sync with the actual state of the filesystem.

What do you see instead?

On macOS, the output of the above script is:

[]

On macOS, the application cannot know when the stream of events from fs.watch will actually start. Therefore it cannot know when it is safe to read the initial state of the filesystem (step 2 in the above set of steps). If the application reads the initial state of the filesystem too early, then the application will be out of sync with the actual state of the filesystem after changes occur. The application could insert an arbitrary delay before step 2 (in my experimentation, 200ms seems to work well), but this degrades performance and is not a reliable solution.

Additional information

The expected behaviour described above imply that fs.watch performs a certain amount of synchronous IO before it returns. It may not be possible or desirable to implement this behaviour on macOS.

Instead, I suggest that, on all platforms, fs.watch should emit a new "ready" event when it starts actively watching the filesystem. Applications can then wait until they observe this "ready" event before they read the initial state of the filesystem.

juanarbol commented 2 months ago

~Maybe your a hitting one limitation as documented in https://nodejs.org/docs/latest/api/fs.html#availability; personally in my machine I get the events.~

~If the underlying functionality is not available for some reason, then fs.watch() will not be able to function and may throw an exception. For example, watching files or directories can be unreliable, and in some cases impossible, on network file systems (NFS, SMB, etc) or host file systems when using virtualization software such as Vagrant or Docker.~

~I'm 99% that those macOS machines are virtualized by GH. Or possibly that kernel is limited, honestly; I don't think I have a bullet-proof answer for this~

Nevermind, libuv uses gh-action macOS machines to test itself, even tho, they a comment on the test that could be related, tho see https://github.com/libuv/libuv/blob/v1.x/test/test-fs-event.c#L697

juanarbol commented 2 months ago

May be related to this: https://github.com/libuv/libuv/issues/3866

riosje commented 2 months ago

Hello guys, I was not able to repro the behavior you mention here.

I created an action to test this on macOS and it works, @juanarbol shared to me the code to test. https://github.com/riosje/filesystem/actions/runs/8929720211/job/24528200537

IMO, If it is really failing the reason must other than the actions itself.

Will with this one

import {open} from "node:fs/promises";
import {watch} from "node:fs";
import {basename} from "node:path";

const path = "test";
const file = await open(path, "w");
await file.close();
const abortController = new AbortController();
const events = [];
const watcher = watch(
    path,
    {signal: abortController.signal},
    (event, path) => void events.push({event, path})
);
watcher.addListener("error", error => void events.push({event: "Error", error}));
// If you insert a delay here, then sometimes the test will pass on macOS.
// The longer the delay, the more likely the test is to pass.
const file2 = await open(path, "w");
await file2.close();
// You can insert a delay here if you want, but it doesn't seem to make any difference.
abortController.abort();
console.log(events);

ayazhafiz commented 2 months ago

I'm also able to reproduce fs.watch not reporting events after construction on Node.js 20.12.2 with the following script

const {join} = require('path');
const fs = require('fs/promises');
const {watch} = require('fs');

async function main() {
  for (let i = 0; i < 10_000; ++i) {
    console.log(`iter: ${i}`);
    const logfilePath = join(process.cwd(), 'file');
    await fs.writeFile(logfilePath, '');

    let w;
    const recvChange = new Promise((resolve) => { w = watch(logfilePath, () => resolve()) });

    await fs.unlink(logfilePath);

    await recvChange;

    w.close();
  }
}
main()

23.4.0 Darwin Kernel Version 23.4.0: Wed Feb 21 21:44:43 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T6000 arm64 arm Darwin

djcsdy commented 2 months ago

@riosje

Hello guys, I was not able to repro the behavior you mention here.

I created an action to test this on macOS and it works, @juanarbol shared to me the code to test. https://github.com/riosje/filesystem/actions/runs/8929720211/job/24528200537

[ snip ]

I wouldn't expect this code to reproduce the behaviour I'm reporting. Or to put it more accurately, I wouldn't expect to be able to tell if it did.

The problem as I understand it is that there is a short, non-deterministic time window between the time when the call to fs.watch returns, and the time when it actually starts reporting events. If, and only if, you modify a watched file during that window, then fs.watch won't report the corresponding change.

In the code you're using, fs.bash writes to the file every second, and fswatch.js watches the file system for 60s. This means you will definitely see some events reported, because the time window during which events are unreported is definitely much shorter than 60s. You'll probably see approximately 60 events for every run depending on random timing differences, but you won't be able to tell if any events were missed because they all look the same.

There is also the problem that fs.bash and fswatch.js are run in parallel, but fs.bash is run first, and therefore will very likely write to the file before fswatch.js calls fs.watch. In this case it is absolutely expected that write would not be reported. But we can't be sure whether this occurred or not in any given run. Fundamentally, the test itself is non-deterministic.

@ayazhafiz's code is more or less equivalent to mine, except that it runs in a loop to increase the chance of encountering an error, and does reproduce, so I think that proves the point.

djcsdy commented 2 months ago

@juanarbol Although you struck out most of your post, I think it's worth responding with an acknowledgement that, yes, this could be a limitation of the Mac VM of GitHub Actions and not a real problem with node or libuv itself. Unfortunately I don't have access to a modern Mac to test myself, so I have been reliant on GitHub Actions for this.

@ayazhafiz When you tested this and successfully reproduced, was that on a real Mac or GitHub Actions or some other VM?

ayazhafiz commented 2 months ago

I ran this on a real Mac M1; the exact OS details are in my original comment.

djcsdy commented 2 months ago

Thank you for testing! :-)

juanarbol commented 2 months ago

I'm investigating a bit from the libuv side, this seems to be libuv issue.

nodejs / node