Open boneskull opened 5 years ago
I do think this is a good idea but as you point out glob does have 6 dependencies, including minimatch
so it's unclear to me how we would vendor it.
globs are not supported by bash v5 on Fedora 30. Running some-cli-app **/*.foo.js
will produce the same result as some-cli-app */*.foo.js
- the double-star is not supported.
As for pulling glob into the core I'm not sure. I think it might be more appropriate to first ask if minimatch
functionality belongs in the core. If so I think it would be important to evaluate pros/cons of multiple modules with similar functionality to determine which one would be best for the core (minimatch and micromatch each have about 15 million downloads per week).
Glob support would be great precisely because it’s not consistent across platforms, but is needed for things like git/npm/eslint ignore, eslint overrides, etc.
As often as globbing is used, and since most developers really don't realize how many edge cases and platform-specific issues need to be considered to produce a consistent experience, IMHO it would be great to add native support.
If Node.js does decide to include this functionality, I would be willing to put some time into helping (assuming that's of any interest). Either way, my recommendation is that you look at the matching and fs operations as completely separate chunks of logic.
If it's decided that this functionality should be re-written directly in Node.js, I'd be happy to take a stab at contributing that logic, starting with just the matching. I think @isaacs has more experience and knowledge of the fs logic and cross-platform considerations than I do. But I'd be willing to help with that as well.
If you decide to use external libs for any of this, I would of course also be willing and interested in collaborating on making any adjustments, improvements, or changes to our libs that are necessary to make them meet your requirements.
I would recommend also taking a look at fast-glob
which is both faster and better maintained than glob
.
// @mrmlnc
Let's not throw subjective shade at other packages, though.
sorry just saw your comment @ljharb (after I did thumbs up)
We're meeting tomorrow and this will be on the agenda. To everyone offering comments: please attend if you're available!
@sindresorhus It sounds like you'd be in support of pulling fast-glob into core, then?
@boneskull I'd be in support of that decision too :)
@ljharb It was not my intention to "throw a subjective shade". I'll be better at justifying my opinions in the future. I just wanted to mention an alternative.
Here are benchmarks comparing glob
and fast-glob
: https://gist.github.com/mrmlnc/f06246b197f53c356895fa35355a367c#file-fast_glob_benchmark_server-txt
By "better maintained", I meant more actively maintained. Poor word use on my part. glob
has almost not had any real code commits in 3 years.
For context, I built globby
(15M weekly downloads now) some years ago to provide a subjectively nicer API for glob
with a Promise API, support for multiple patterns, negated patterns, etc. Then fast-glob
came along which fixed a lot of bugs found in glob
and I eventually switched globby
over to use fast-glob
. In my subjective opinion, fast-glob
has a better API and codebase.
It sounds like you'd be in support of pulling fast-glob into core, then?
Yes
OK, we discussed this at today's meeting
path.match()
fs
, and we can cross that bridge latergimme feedback
I would be thrilled to hand glob
over to node core.
That being said, there are a few rough edges that ought to be polished down first, so here I go brutally trashing my own work :)
The sync/async management is (overy) clever. It works fairly well, but it's not my favorite approach. Since it somewhat predates proper Classes in JavaScript, glob uses a different approach than most of my other more recent libraries that walk over trees. (Eg, tar, npm-packlist, etc.) The code is a bit brittle and hard to work with. It's also very (overly) optimized for performance, and meticulously designed to match Bash 4's glob behavior.
None of which would in itself be a problem, if glob was actually a good cross-platform general-purpose file system globber. We could call it done and let it age gracefully. However, it has some issues with UNC paths on Windows, and lacks proper promise support, both of which are very challenging to fix in its current state.
I'd love to give it a rewrite, and may someday, but honestly, if it was a priority, I probably would've done it by now. Maybe the best approach is to use it as a reference implementation, but rewrite it, fixing all the bugs and preserving the tests and actual glob resolution behavior. It works well enough for a lot of people right now (including me!), but caveat emptor: if you pull it in as-is, you might be welcoming a very stinky camel into your tent. Make sure it's not too hard to replace, if you do so.
By "better maintained", I meant more actively maintained. Poor word use on my part. glob has almost not had any real code commits in 3 years.
"Better maintained" is actually a fair and accurate assessment. No shade taken :)
I'm working to document this initiative, so here's a rough draft of what I have, including a comparison of existing solutions. If I'm wrong here, please say so. I'll send a PR after it's had a bit of review, but wanted to keep everything in this issue for now.
See nodejs/tooling#38.
TL;DR: Runtime handling of Globs is needed for better portability
Globs (henceforce capitalized) are natively supported in shells like Bash, which means you can do this, which will recursively find files matching *.foo.js
, from your current working directory:
$ some-cli-app **/*.foo.js
Bash expands the above before execution of some-cli-app
into a list of files matching the Glob.
But we get in to trouble when our scripts (e.g., in package.json
) use Globs, because Windows won't necessarily expand these, and the scripts will fail (or otherwise exhibit unexpected behavior).
To support Globs in a portable way, any CLI app would need to pull in glob or a similar module, and pass the Glob as a (double-)quoted string in the arguments:
$ some-cli-app "**/*.foo.js"
The utility of glob and others like it has been well-established, and it's used in enough CLI apps that the functionality belongs in Node.js core.
The first thing that needed is a matching implementation. A Glob implementation (which uses the matcher to compare filenames) would then follow.
We expect the new function(s) would be added to the path module. That said, we must be careful to not conflate filepaths and Globs, because they are not the same.
An underlying matching implementation is a prerequisite to Glob support in Node.js core. To determine what we should be implementing, we'll compare existing solutions.
Of particular concern is how path separators are handled. While Globs are not filepaths, developers often treat them as such. The output of
path.join()
will be system-dependent, and if its output is passed into a matching implementation, the result may be unexpected. Please see micromatch's notes on the topic.
We'll compare the featureset of the following:
We've identified the latter two as being most popular packages which provide the underlying matcher to a Glob implementation.
We will consider the following relevant features:
*.js
)**.js
, a/**/*
)!a/*.js
)+(x|y)
, !(a|b)
)foo-[12345].js
)foo-[1-5].js
)[[:alpha:][:digit:]]
)bar/{a,b,c}.js
)foo/{1..5}.md
; equivalent to foo/[1-5].md
)#*.js
)foo.?s
)For purposes of this best-effort comparison, we'll consider Bash's full capabilities instead of its default capabilities. All comparisons should reflect the latest version (as of this writing) of the respective software.
If a user can toggle the specific behavior, that's noted with "Flag."
Wildcards | Globstar | Negation | extglob | Char set | Char ranges | Char class | Brace Expansion | Brace Ranges | Case | Comments | Single-Char | Dotfile | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bash | Yes | Flag | Yes | Flag | Yes | Flag | Yes | Yes | Nob | Flag | Yes | Yes | No |
gitignore | Yes | Yes | Flaga | No | Yes | Yes | Yes | Yes | No | No | Yes | Yes | No |
micromatch | Yes | Flag | Flag | Flag | Flag | Flag | Flag | Flag | Flag | Flag | Noc | Yes | Flag |
minimatch | Yes | Flag | Flag | Flag | Yes | Yes | Yes | Flag | Flag | Flag | Flag | Yes | Flag |
\!
at the beginning of a pattern for a literal !
{1..5}
) as sequences which can be used to create multiple patterns at once, but are not patterns themselves.My key takeaway: if you expected the implementation to be simple, you will be disappointed. :smile: I don't think we can get away with not offering a handful of flags, either...
c: If micromatch does support comments, it's undocumented.
Correct, micromatch doesn't support comments.
My key takeaway: if you expected the implementation to be simple, you will be disappointed. 😄 I don't think we can get away with not offering a handful of flags, either...
Indeed. I recommend also testing nested braces and extglobs, as well as braces and extglobs with nested negation patterns. Nested patterns are used more often than one might guess.
Regarding bash, both minimatch and micromatch have (intentionally) different behavior in some cases.
Also, it's important to note that .gitignore
has completely different matching rules than glob. Both minimatch and micromatch can be used for matching gitignore patterns, but neither library has native support. FWIW I haven't seen any node.js libraries do this correctly yet.
Another thought.... joining paths to globs is not trivial, but it's totally possible to do it correctly, consistently, in a cross-platform compatible way.
@jonschlinkert
Another thought.... joining paths to globs is not trivial, but it's totally possible to do it correctly, consistently, in a cross-platform compatible way.
In your opinion, does micromatch do this correctly?
Also, it's important to note that .gitignore has completely different matching rules than glob. Both minimatch and micromatch can be used for matching gitignore patterns, but neither library has native support. FWIW I haven't seen any node.js libraries do this correctly yet.
It seems it's implemented in its internal wildmatch.c. There's a JS port of wildmatch but I can't vouch for its accuracy. It also hasn't been published in 5 years.
@jonschlinkert I'm curious. If you were going to pare down micromatch to its essence (there are a lot of options, as you'll probably agree), what would you keep? What would you drop?
(Since we're aiming to determine our "MVP", as it were)
IMO, in our implementation, I could definitely do without:
.gitignore
file? multiline globs don't make much sense in-and-of themselves unless newlines are stripped anyway, right?{1..5}
); too confusing in this context. in minimatch, this is equivalent to [1-5]
. micromatch seems to have more granular control over the behavior. it veers into "shell expansion" territory. I'm not sure if this is needed at all, but I'm happy to be convinced otherwisepls note that on Friday we have a meeting and everybody is invited. bring a +1 if you want
Same question as in #19: what can node core do better than userland? I'm not opposed, but I do think node core can aim higher than "users won't have to install glob" and that those goals should be more clearly formulated before diving into specifics.
In your opinion, does micromatch do this correctly?
Micromatch doesn't do any kind of joining, it's all left to the user to figure this out - same as minimatch and node-glob. However, I did start implementing this behavior somewhere, and I've given examples in many issues over the past few years, and I'd be happy to discuss over code when the time comes.
In essence, it boils down to ensuring that all slashes are converted to posix slashes in a non-glob file path before converting it to a glob pattern. There are other nuances that would warrant a longer discussion.
If you were going to pare down micromatch to its essence (there are a lot of options, as you'll probably agree), what would you keep? What would you drop?
My first reaction is to drop brace expansion, like you mentioned. Especially since globs with braces can be expanded before passing them to the matcher. (as a side note micromatch expands {1..5}}
to [1-5]
, minimatch expands {1..5}
to ['1', '2', '3', '4', '5']
). But these are contrived examples. Real world patterns are closer to things like account-S{001..990}
, which would expand to account-S(0{0,2}[1-9]|0?[1-9][0-9]|[1-8][0-9]{2}|9[0-8][0-9]|990)
with micromatch, and would be an array of 990 globs with minimatch).
Overall, as you think about which features to include or drop, I think the most important consideration is how globs are used. With CLI tools like eslint, end-users don't have the ability to pass options to the matcher, or customize behavior by, for example, expanding braces beforehand. Removing a bunch of features or options will just make globbing useless or painful to a lot of end-users, or cause implementors to find hacky workarounds. I found this out the hard way over the past few years. Globbing is hard to do correctly, and implementors always seem to underestimate the number of edge cases their users will experience with it (especially because of Windows paths, and the general misconception that globs are just fancy file paths). This isn't an argument for not having native glob support in Node, it's the opposite. I think we can smooth this out and create consistency for users.
If we look at this from 20,000 feet, what we think of as "globs" in Node.js is actually a combination of several different matching concepts, some from shell expansion, others from elsewhere:
*
, ?
, and "globstars": **
{a,b,c}
=> ['a', 'b', 'c']
[:alpha:]
!(foo|bar)
Beyond features, there are a handful of things to consider in determining behavior:
there are a lot of options, as you'll probably agree
I do. I really, really do. lol. When I created the parser for the latest micromatch, I had to make decisions about how the potentially-conflicting rules of each matching concept should work together. Instead, I avoided making those decisions by allowing the user to do it via options.
If micromatch is pulled into Node, I would be more than happy to reduce options or hard-card certain behavior. This is not expected of course.
Almost forgot, picomatch is the matcher behind micromatch.
(edit: I initially said that picomatch doesn't have brace support, but I forgot that it does in the latest release. However, micromatch adds support for more complex brace expansion features).
@vweevers That depends what you mean by "better," I suppose. Node.js doesn't provide a "better" way than userland to build services out-of-the-box; but its net
and httpx
modules enable userland to build better services. While yours is a valid question, it's not the only question, and adding something to core should not hinge on the answer.
Another question to ask is "what's essential?" Recursive mkdir and rmdir--as well as argument parsing in #19--are essential for tools which operate on files, which is quite a lot of them.
So, glob support is essential, because:
glob
/fast-glob
/globby
if they want more consistent cross-platform behavior anyway, which they do (or if they don't, they should).fs
module. I don't want to speculate on what this'd look like, but it'd open up the opportunity.FWIW, Python's glob support is pretty barebones (it looks like they added globstar support in v3.5). I haven't used it myself, so don't know how Python tooling authors are getting along with it, but I'd sure like to know.
IMO, in our implementation, I could definitely do without:
- implicit dotfile support (a flag that essentially ignores the leading dot of a filename when matching)
What about modules like gulp
where it might be desirable to match dot files (or files inside dot directories). For gulp.src
it's probably easy enough to explicitly add patterns for dot-files and dot-directories which need to be copied but dot: true
is easier than managing the extra glob patterns.
- comments--what is the use case for this outside of a script or
.gitignore
file? multiline globs don't make much sense in-and-of themselves unless newlines are stripped anyway, right?
The one exception could be to work around the fact that JSON does not support comments. Completely ignoring any pattern matching /^#/
could allow comments in JSON glob array fields, for example:
{
"files": [
"**/*.js",
"# reason for not publishing file.js",
"!file.js"
]
}
I know this could be accomplished by pre-processing the array before passing to the matcher function but the idea here is consistency, for globs to eventually work the same everywhere. I don't see any value in supporting inline comments, for example "!file.js # reason for this ignore"
.
- brace expansion ranges (
{1..5}
); too confusing in this context. in minimatch, this is equivalent to[1-5]
. micromatch seems to have more granular control over the behavior. it veers into "shell expansion" territory. I'm not sure if this is needed at all, but I'm happy to be convinced otherwise
If this is supported should it automatically expand range as per the example at https://www.npmjs.com/package/micromatch#optionsexpandrange? Seems confusing that micromatch.makeRe('{10..15}')
produces /^(?:[10-25]\.js)$/
by default instead of /^(?:(1[0-9]|2[0-5])\.js)$/
.
it's probably easy enough to explicitly add patterns for dot-files and dot-directories which need to be copied but dot: true is easier than managing the extra glob patterns.
Agreed. For users, it's much more difficult to exclude unwanted dotfiles when they are included by default, than the other way around. Also, every globbing implementation I'm aware of excludes dotfiles by default. In bash, for example, you need to set dotglob
to match hidden files.
It's also really common for users (or implementors) to use patterns like *
to match all of the files in a directory, forgetting that some files are hidden (like the .DS_Store
file in every directory on MacOS).
Seems confusing that
micromatch.makeRe('{10..15}')
produces/^(?:[10-25]\.js)$/
by default instead of/^(?:(1[0-9]|2[0-5])\.js)$/
.
Agreed. I've been planning to change that to the default. Thanks for bringing it up.
Python's glob support is pretty barebones
That depends on what you mean by bare bones lol. It supports globstars, ranges like [1-5]
, and even environment variable expansion. As a side note, I did this on a branch on picomatch, after a discussion with the jest core team They support basic path variables, and it's hard to expand them correctly. It turned out to be really useful, and fast, since it only adds 5 or 6 lines of code to the parser.
If we move forward on this, I'd suggest using node-glob
's test suite, and picking a version of Bash to use as a reference implementation (node-glob uses Bash 4.4.), and re-implementing from first principles (perhaps by porting or importing the bash C implementation, though that might be really challenging). Globstar and extglob should be enabled by default, but it'd be nice to be able to turn them off.
The reason is that "globbing" has been extended and implemented in a variety of different ways, and no two implementations are going to be 100% identical. There is no specification that covers all the cases users expect. So, it's not strictly accurate that all "posix shells" implement "globbing" (at least for some values of "posix shell" and "globbing").
@coreyfarrell **
works on any Bash v4, but I think in the environment you tested, globstar
was probably not on by default. shopt -s globstar
should enable it.
Also: note that ignore-file style globbing is somewhat different from shell style globbing (though there are similarities). See https://github.com/npm/ignore-walk
We discussed this in the meeting today.
Looking at it holistically, the result is that those of us in the meeting felt like this is biting off more than we can chew. Personally, my excitement has dimmed to the point of not wanting to champion the effort (#19 will be taxing enough, and I do want to help see that one forward).
If anyone else here wants to take up the cause, nobody else here is stopping you--and I think we're happy to work with anybody who does want to move it forward--but it didn't sound like any of those in attendance have the requisite, uh, gumption.
going to close this. somebody reopen if they want to champion.
I expect that I'll have some time to pick this up and champion it after npm v7 and tap v15 both ship. Glob's next on my list. Probably near the end of 2020, or early 2021, so most likely it'll be a node v15 feature.
I believe that the goal of an implementation should be to match the behavior of some specific and recent version of Bash, as it's the most widely used thing that people glob with. (But only as a reference implementation -- my explorations in the past showed that the actual code in bash tends to intermingle globbing with other string parsing features in a way that is very performant for a shell, but very bad for reusability in a platform like Node. We probably don't want to support command or environment substitution, for example, but should make it straightforward to add in userland.)
I'm going ahead and reopening this, since it sounds like @isaacs is a potential champion -- it's fine if things take a while :+1:
A PR introducing fs.glob
and fsPromises.glob
has been merged: https://github.com/nodejs/node/pull/51912
I'm curious if anyone thinks adding glob support to core would be a good idea?
Globs (globspecs? what's the right term?) are natively supported in POSIX (?) shells, which means you can do this:
This is expanded by the shell before execution of
some-cli-app
into a list of files matching the glob.But we get in to trouble when our scripts (e.g., in
package.json
) use globs, because they will not be expanded, and the scripts will fail (or otherwise exhibit unexpected behavior) on Windows.To support globs in a portable way, any CLI app would need to pull in
glob
or a similar module, and pass the glob as a quoted string in the arguments:(On Windows, you must use double-quotes above, but using single-quotes in POSIX shells means env variables [e.g.,
$FOO
] will be treated as literals; env variables are another can of worms)The utility of
glob
has been well-established, and it's used in enough CLI apps that it may be a good candidate for vendoring or some other implementation. I haven't looked at theglob
source, so it's unclear whether vendoring would be the best route, but I do know it contains several other dependencies which would also need to be pulled in (right @iansu?).Thoughts?
cc @isaacs