Open lukeify opened 8 years ago
@jprichardson Is this expected behavior?
@jprichardson Is this expected behavior?
Good question. It seems that it should NOT operate this way. But since this was never clearly documented, I think we should fix it and bump major to just be extra cautious.
SGTM, PR welcome.
This is actually very closely related to https://github.com/jprichardson/node-klaw/issues/17. When the Walker stream is constructed the source dir is set as the only item in this.paths
. When the stream is first read this first directory is lstat()
ed and pushed without the filter function ever being applied to it. If it were applied in this case - because the filter function prevents even processing the path - the result would be an empty array.
I think an interesting compromise would be for this.paths
to become an array of objects containing a path and an lstat()
result. The filter function could be passed this object (definitely a major version bump). On read, instead of calling lstat()
we consult the lstat()
result in the object. This would allow the lstat()
result be used by the filter function to optionally retain subdirectories in the search - meaning the user would not have to create a duplicate call themselves. The constructor could also call the filter function on the source directory and if it were not retained emit an error.
The most important part of this is to more clearly document the purpose and effect of the filter function - the most important reason for it to exist is to prune branches from the directory tree that we are not interested in traversing at all. The performance savings of not lstat()
ing a simple file are expected to be negligible - saving lstat()
, readdir()
and recursion in to many subdirectory trees could be very significant.
Conversely, the type of filter described in this issue is not intended to filter out subdirectories and can be implemented more effectively by pipe()
ing to a through2 stream.
As an alternative to breaking changes, another solution would be to introduce a new option called prune
which behaves roughly as I described above, but only is called for directories. filter
can then be applied only to non-directory files before they are push()
ed. I can attempt to implement this alternative as a pull request if you'd like.
Any updates on this issue?
I am walking a directory,
./source
, and even though I have specified a filter to include only markdown files viapath.extname()
, I am still receiving the root directory as an item in my final array.I expected
['foo.md', 'bar.md', 'baz.md'];
I actually got
['foo.md', 'bar.md', 'baz.md', 'source'];