Open matklad opened 5 years ago
Oh, that's a great idea! Certainly something I'll use for vNext, but would also be great if it was implemented for the current branch... not sure how feasible. Thanks for the lovely idea!
In the mean time, hotwatch looks pretty cool: https://crates.io/crates/hotwatch (good job @francesca64):
use hotwatch::{Hotwatch, Event};
let mut hotwatch = Hotwatch::new().expect("Hotwatch failed to initialize.");
hotwatch.watch("war.png", |event: Event| {
if let Event::Write(path) = event {
println!("War has changed.");
}
}).expect("Failed to watch file!");
I liked it so much I put it in the readme.
Hotwatch API looks nice (especially due to not exposing mpsc channel), but it doesn't handle the most tricky bit: filtering the watched directories. That's a pretty important bit for rust-analyzer: if we just watch the whole directory with Cargo project, we get flooded (up to 100% CPU) with events for the target
directory when Cargo starts building the project.
Also cc @vemoo, you might be interested in helping out with this :)
Another thought: for this API, it would be important to specify the consistency semantics of the events. Obviously, if I receive Event::Create
, I can't just blindly read the file: it might have been deleted in the next moment! Similarly, I'd rather not have a Rescan
event in the API and instead delegate rescaning to the library.
I think what we want here is quiescent consistency: if there are no changes for a period of time, the last event for each path should correspond to the actual state. In other words, in the sum of the events between two quiescent periods must be equal to the difference between two walkdir traversals.
Agreed on the filtering.
Rescan
is a historical mistake, really. It means two things on two platforms, and has no equivalent on any others. In inotify it's really a fatal error event, and in fsevents it's an indication that something was missed, but fsevents misses things all the time by design (there's a huge discussion elsewhere in issues about fsevents with the tl;dr and conclusion that fsevents should not be used for this library, see #144, #147; alas, time).
Probably what that library should do to achieve such consistency is to attempt to attr
files right after the quiet down and then returns Attrs
(or whatever the struct is called) alongside the path and Op
. If the file is non existent it could emit/modify the event to actually reflect reality, and exposing the Attrs would save the caller from doing a subsequent (and possibly outdated by then) lookup.
I don't know that guaranteeing "the sum of the events between two quiescent periods must be equal to the difference between two walkdir traversals" is practical, because I have low confidence in the actual reliability of these platforms APIs. Appending this guarantee with "in an ideal world" is probably good? No, I am being overly cynical here. Relying only on native events is on the order of 99.9% accurate in 90% of cases. The 10% include things like unsupported filesystems, monitoring huge trees, networked shares, permission issues, and ghosts. As maintainer, I tend to see those 10% a lot more.
Appending this guarantee with "in an ideal world" is probably good?
Yeah, that's what I've meant. Specifically, this clause about "consistency" was about a sublte bug we had in rust-analyzer.
Initially, we did recursive walkdiring on the one thread, and watching on another. We also are interested in the actual "contents" of the files, so both walkdiring thread and watching thread were reading files from disk and sending the results to the single channel, which was the "API".
The problem with this setup is that one thread can read file at time t1, another thread can read the file at t2 > t1, but the events in the channel could be in a swapped order. We fixed this by making sure we always read file contents from the first thread, and made the watching thread to only send requests for reading the files to the first one: that way we guarantee that file contents only progresses forward in time.
That is, besides infinite ways in which native APIs are broken, there's a fair amount of places where we might accidentally break logical "happens before" relation, and we need to think about that :)
Yes, I can try to implement this.
For v4 at least for inotify shouldn't be too hard since it's already using WalkDir
internally. For windows and mac I'll have to take a closer look at the code. It seems they nativelly suport recursive watches, but maybe when a filter is provided instead of creating a recursive watcher we should handle the recursion ourselves.
From taking a quick look at the next v5 code the issue I mention before is more clear thanks to the Capability
but the solution could be the same. What I think is missing is a way of adding watches after the Bakend
is created, that way we can implement the filtering on top of the backend.
In v4 you get a different implementation per platform that exposes the same interface, in v5 you'd never actually interact with Backends yourself. vNext has very strong separation of abstraction between what Backend implementations do and what the consumers' concerns are, so "vNext Backends" are immutable (makes them super easy to write), and "vNext Notify" itself is a layer that manages adding and removing watches (aka creating and deleting backends) and processing events.
In the current design of vNext what you'd do is implement a Processor
that does the recursion and another that does the filtering, and add them in when .recurse()
or .filter(|f| !is_hidden(f))
are called on this hypothetical WatchDir
builder. So I like this API design very much because it's exactly what vNext is made to be! (For more, see this draft of a presentation/announce.)
(The current state is that Processor
s are described and there's a trait, but they're not hooked into the Notify mechanic yet. I keep getting bogged down by async code. I anticipate refining/adjusting how the work/look once or while I get them working.)
I've started working on the feature for v4, and I have a few questions.
I think ideally the filtering should in RecursiveMode
something like:
pub enum RecursiveMode {
Recursive,
NonRecursive,
Filtered(...)
}
But that would be a breaking change, so I guess the best thing is to create a new method to create a watcher, new_with_filter
maybe?
Should we filter files also? It would be a nicer api, but that would mean possibly turning rename events into create or delete if one of the names is filtered, that wouldn't be too bad for debounced events but I don't know how it should be handled for raw events. I was thinking of initially implementing directory filtering only. Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir
In the current inotify implementation for recursive watching, when a directory is created it's walked recursively and the found directories are also watched. But shouldn't we also emit create events for all the found files and directories?
Regarding the v5, I think the backend trait is missing a way to configure the type of watch when creating it, specifically I'm thinking of creating a non recursive folder watch on windows for example. Also if I'm following the code correctly, for recursive watches with backends that don't support it nativelly, it will end up allocating Vec<PathBuf>
each time a directory is created.
vNext is not a viable development target at this time, so don't spend too much
time on it. You bring a good point on the non-recursive-on-windows thing.
More generally, the current trait offers no facility for passing options to
backends, which has other usecases (e.g. event filtering in the kernel). I
do expect that to be resolved by the time it stabilises, but for now the
added complexity was not worth it. Similarly the core likely allocates way
too much at the moment — in that specific case PathBuf
is easier for
prototyping, but I have thoughts around a descriptor enum to support using
references, inodes, or handles directly, for example. There's many other
such details and questions left... I'd like to get the general shape of the
whole thing going before refining; or at least that's the idea.
I'll have a think about the rest.
I'm not opposed to another method but would it want a raw_with_filter()
as well or would the filtering require the debounce?
Also wondering whether adding a variant to that enum would really be a breaking change... and if it is, whether it would be better to just do that, instead of a perhaps worse solution.
On Sat, 9 Feb 2019, 10:31 vemoo <notifications@github.com wrote:
I've started working on the feature for v4, and I have a few questions.
-
I think ideally the filtering should in RecursiveMode something like:
pub enum RecursiveMode { Recursive, NonRecursive, Filtered(...) }
But that would be a breaking change, so I guess the best thing is to create a new method to create a watcher, new_with_filter maybe?
Should we filter files also? It would be a nicer api, but that would mean possibly turning rename events into create or delete if one of the names is filtered, that wouldn't be too bad for debounced events but I don't know how it should be handled for raw events. I was thinking of initially implementing directory filtering only. Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir
In the current inotify implementation for recursive watching, when a directory is created it's walked recursively and the found directories are also watched. But shouldn't we also emit create events for all the found files and directories?
Regarding the v5, I think the backend trait is missing a way to configure the type of watch when creating it, specifically I'm thinking of creating a non recursive folder watch on windows for example. Also if I'm following the code correctly, for recursive watches with backends that don't support it nativelly, it will end up allocating Vec
each time a directory is created. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/175#issuecomment-461953748, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJgi3NajM-GDCkvhjj4L0hW4JwFoRKnks5vLey-gaJpZM4aUE6n .
Can you explain what you mean by
Also to make the configuration more complete I was thinking of exposing a
Fn(&Path)->WalkDir
What I mean by "wondering whether adding a variant to that enum would really be a breaking change" is that RecursiveMode
is never returned from the API, so there's no reason to match on it. I think the exception to that would be a library that re-exports RecursiveMode
to their users and then pattern-matches on it rather than passing it straight through to Notify.
I'm weakly leaning more towards bumping the major anyway. I think a large new feature like this is worth it, especially if it means doing it right. Upgrading would be trivial, even for libraries as described above. Adoption would be slower, but the nicer API would be a good incentive.
Directory filtering at least as a first approach sounds good, as that would be most of the value.
shouldn't we also emit create events for all the found files and directories?
I think that would make sense, but I wonder what others do:
I think this could be handled separately, though, so it doesn't block this issue.
Can you explain what you mean by
Also to make the configuration more complete I was thinking of exposing a
Fn(&Path)->WalkDir
So that the user could set the available options in WalkDir
. But now that I think about it the only one that makes sense is follow_links
, since min_depth
and max_depth
would only be usefull relative to the watch root, not each folder on which we well be using the WalkDir
. So I'm thinking the filter could be:
struct RecursionFilter<P>
where
P: Fn(&Path) -> bool,
{
filter_dir: P,
follow_links: bool,
}
Right, sounds good.
I think I found a way to do it here. I've created an abstraction for the watcher implementation: WatcherInternal
and that way I can implement the recursive filtering and keep track of the which nested watches belong to which actual watch in RecursionAdapter
.
It's mosly an idea, I haven't tried to implement it yet for any backends. I'll try inotify first, by implementing WatcherInternal
for inotify::EventLoop
, and see if it works.
Random thought: in rust-analyzer, we've noticed that on mac we can get Crate event when the write is expected. That is, distinguishing between the kind of event reliably seems tricky. So perhaps instead of
Event::Create => (),
Event::Change => (),
Event::Delete => (),
We should have
Event::MaybeChanged => (),
?
I have it mostly working for inotify
(https://github.com/vemoo/notify/tree/watch-filter). There's 1 test (watch_recursive_move_out
) that fails always and some other that fails ocasionally. I'll investigate.
I have yet to add tests for the filtering, but I think the hardest part is done. Adding it to the other watches should be easier since they don't have a recursive implementation that has to be replaced.
This looks good, I like this.
On Mon, 18 Feb 2019 at 09:14, vemoo notifications@github.com wrote:
I have it mostly working for inotify ( https://github.com/vemoo/notify/tree/watch-filter). There's 1 test ( watch_recursive_move_out) that fails always and some other that fails ocasionally. I'll investigate.
I have yet to add tests for the filtering, but I think the hardest part is done. Adding it to the other watches should be easier since they don't have a recursive implementation that has to be replaced.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/175#issuecomment-464503283, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJgi8Udamae3R-UWcZe1OM0Meheebd5ks5vObgggaJpZM4aUE6n .
I'm slowly making progress.
The test that was failing was a timing issue, I fixed it here: https://github.com/vemoo/notify/commit/51d80bc1ed7f1a7ec88f8b50b009f832fa4f938d
I ended up implementing filtering for files also, because it wasn't that much more work. This is the filter struct: RecursionFilter
and this is the item to filter on: FilterItem
To implement the filtering I needed to know if a rename event was the "from" or the "to" part, to try to add or remove watches. To keep it simple I check if the path exists and if it does I treat it as a rename "to", otherwise as a rename "from". But inotify and windows distinguish between the two events, so I though that it would be usefull to separate the RENAME
op in two.
I also found something in Debounce
that might be a bug. Given this raw events (filtered):
[
(
"/tmp/temp_dir.seaDBoBs6epi/dir1",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/file1",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/file1",
CLOSE_WRITE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1",
CLOSE_WRITE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2",
CLOSE_WRITE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1",
CREATE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1",
CLOSE_WRITE,
None
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/non_ignored_dir",
RENAME,
Some(
2384
)
),
(
"/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2",
RENAME,
Some(
2386
)
)
]
the debounced events are:
[
NoticeRemove(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/non_ignored_dir"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/file1"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file1"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file2"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1/file1"
),
Create(
"/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2"
)
]
(this is the out put of running the tests https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L68 and https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L84 on linux) I think having separate "from" and "to" events will help fix this.
I've also started working on integrating it in windows, but it's a bit harder than I thought. For mac I won't be able to try it, unless there's a way to do it from linux.
I should have my mac back from the 23rd onwards, so I'll be able to help thenre.
On Mon, 25 Feb 2019, 08:09 vemoo, notifications@github.com wrote:
I'm slowly making progress.
The test that was failing was a timing issue, I fixed it here: vemoo@ 51d80bc https://github.com/vemoo/notify/commit/51d80bc1ed7f1a7ec88f8b50b009f832fa4f938d
I ended up implementing filtering for files also, because it wasn't that much more work. This is the filter struct: RecursionFilter https://github.com/vemoo/notify/blob/watch-filter/src/recursion.rs#L30-L35 and this is the item to filter on: FilterItem https://github.com/vemoo/notify/blob/78c2feb6d9ce440b4ec0da9b1bb7214f3b7734a5/src/recursion.rs#L13-L18
To implement the filtering I needed to know if a rename event was the "from" or the "to" part, to try to add or remove watches. To keep it simple I check if the path exists and if it does I treat it as a rename "to", otherwise as a rename "from". But inotify and windows distinguish between the two events, so I though that it would be usefull to separate the RENAME op in two.
I also found something in Debounce that might be a bug. Given this raw events (filtered):
[ ( "/tmp/temp_dir.seaDBoBs6epi/dir1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/file1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/file1", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/non_ignored_dir", RENAME, Some( 2384 ) ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2", RENAME, Some( 2386 ) ) ]
the debounced events are:
[ NoticeRemove( "/tmp/temp_dir.6u6AFtuQQITu/dir1/non_ignored_dir" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/file1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file2" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1/file1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2" ) ]
(this is the out put of running the tests https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L68 and https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L84 on linux) I think having separate "from" and "to" events will help fix this.
I've also started working on integrating it in windows, but it's a bit harder than I thought. For mac I won't be able to try it, unless there's a way to do it from linux.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/175#issuecomment-466806460, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJgixE11O0NTrdT2RmxyXgEsCbTgj-fks5vQuNogaJpZM4aUE6n .
It seem this approach doesn't work for windows. Because windows doesn't allow to delete the watched directory neither renaming the parent directory, and in this approach we create a non recursive watch for each directory. There are some tests that I didn't see until after I hit this issue: https://github.com/vemoo/notify/blob/dc2e46b6bd4c3c9d3f9edaa6bcfc705b11c24137/tests/watcher.rs#L1117 https://github.com/vemoo/notify/blob/dc2e46b6bd4c3c9d3f9edaa6bcfc705b11c24137/tests/watcher.rs#L1461 that are ignored for windows for these reason.
Hi!
We've been using notify in rust-analyzer, and one thing I've noticed that a task like "watch
src/**.rs
glob" requires quite a bit of manual implementation. Specifically, I think two bits are complex:rescan
event which requires repeating the recursive logic againI feel like there's some higher-level API missing here... Ideally I'd love to use something like walkdir: