Closed arlyon closed 12 months ago
Note, it is possible to work around this by modifying the walk glob manually to match the behaviour I am describing, however globs are user supplied and so I cannot reasonably advise them to modify their globs.
- let glob = make_glob("crates/**/*.rs");
+ let glob = make_glob("crates/{target,**/*.rs}");
Patterns are never implicitly exhaustive. Just as crates/**/target
will never positively match paths with names or components beyond target
(for example, the path crates/wax/target/debug/
does not match), a negation will not negatively match paths with names or components beyond target
either. As a negation, crates/**/target
only discards paths that terminate in a component matching target
anywhere beneath a component matching crates
. The negation crates/**/target/**
is probably what you want here if I understand correctly. I would expect that pattern to avoid the spurious reads.
I think the most common kind of negations that folks want must terminate in a tree wildcard **
like this. I've considered making this implicit, but that would force all negations to behave this way and I can imagine a need to avoid more specific paths, among other issues.
Thanks for the quick reply! Using doublestar on the negation (crates/**/target/**
) doesn't make a difference in my tests. I have attached some chrome traces of the various combinations. (not to scale, the importance is that big gap in the middle where we are neither yielding nor skipping folders).
crates/**/*.rs - crates/**/target/**
crates/**/*.rs - crates/**/target
crates/*/{target,**/*.rs} - crates/**/target/**
crates/*/{target,**/*.rs} - crates/**/target
The negation
crates/**/target/**
is probably what you want here if I understand correctly. I would expect that pattern to avoid the spurious reads.
You are partially correct, that does prevent rs
files in target
from being yielded (yay, part of the issue solved) but, in terms of performance, the issue is that if the Walk
doesn't yield the target
folder to the FilterTree
then we don't get the chance to skip it and so we end up recursively walking the folder. If we end up finding a .rs
file, the FilterTree will correctly filter it out but we still waste time walkin a folder we don't need to.
the issue is that if the
Walk
doesn't yield thetarget
folder to theFilterTree
then we don't get the chance to skip it
You're absolutely right! Filtering is not composed correctly and FilterTree
only operates on the subset of items it receives from Walk
. I'm working on a fix now.
After a bit of manual testing, I believe that I have a preliminary fix (currently on the filter
branch). It introduces SeparatingFilter
s, which allow monotonic (one-way) filtering of a separation with special support for hierarchical tree iterators (like those reading directory trees from a file system) where all combinators can apply filter-mapping functions to the entire feed (both its filtrate and its residue).
This has introduced a significant amount of code, so I've reorganized the walk
module and now expose it in the public API rather than re-exporting some of its items into the crate root. The walk
module is now more substantial, but it preserves the current combinator API and its good ergonomics. 🎉
Hello! Thank you for taking this on, you saved me a decent chunk of work :) I will pull down the branch and have a look.
I've landed a fix in 76afaa1 on master
. :tada:
There are definitely some gaps in testing here, but this unit test should detect the underlying problem that caused this issue.
Consider the code below
https://github.com/olson-sean-k/wax/blob/1afcda8318201afc04ebed06fea907d54fc1bf8c/src/walk.rs#L749-L767
Given the globs above, the walker will end up traversing into every single
target
folder and never trigger a skip. The 'skip tree' optimisation will never trigger because it is only able to skip a folder after it is yielded to the negation viaself.input.next()
, and mosttarget
folders will a) not have rust files to yield and b) if they do, will not match the exclude case because they are a file and not the folder itself. That means that we will not correctly detect when we canTreeIterator::skip_tree
. Seems like we need to invert the order here so that theFilterTree
is wrapped by theWalk
so that it may reject items first. Alternatively we could modify theWalk
pattern to include the excluded globs, so that they are yielded to theFilterTree
for exclusion but that seems a little inelegant...I believe that this will have no impact on the behaviour, but it will make walking with negations much more efficient.