yarnpkg / berry

πŸ“¦πŸˆ Active development trunk for Yarn βš’
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.47k stars 1.12k forks source link

Support glob patterns in the shell #22

Open arcanis opened 5 years ago

arcanis commented 5 years ago

Describe the user story

I'm a Yarn user and want to use glob patterns in my scripts. For example:

{
  "scripts": {
    "grep": "grep -F \"$1\" packages/*/sources"
  }
}

Describe the solution you'd like

The shell should support a glob syntax (at least * and **).

Describe the drawbacks of your solution

Writing a good globing mechanism is difficult. The simplest would be to reuse an existing library such as glob, but I wonder how that would fit in the cross-platform story.

Describe alternatives you've considered

We simply could decide not to add support for glob patterns, and let users be more explicit when they wish to use them (grep -F \"$1\" $(bash -c 'echo packages/*/sources.js')). I guess them not being supported would be a surprise to most users, though.

Additional context

Ideally the syntax and the glob mechanism should be separate - the shell should expose a new glob option that would accept an async function that would return the match entries for a given argument and cwd, and default this option to whatever implementation we choose (likely the glob package). This way the shell consumers will be free to choose a different implementation if they want to, and it keeps the shell dependencies on the fs module low.

LaRuaNa commented 5 years ago

I just tested glob works both on windows and unix with the limitation that you always need to use forward-slashes and no backslashes. Also described here.

arcanis commented 5 years ago

Nice πŸ‘ Given that the shell implementation is meant to provide a way to write portable scripts, restricting to forward slashes should be perfectly fine

milesj commented 5 years ago

I'd suggest fast-glob over glob. https://github.com/mrmlnc/fast-glob

LaRuaNa commented 5 years ago

Checked that one also glob seems to be more 'stable' after seeing this issue https://github.com/mrmlnc/fast-glob/issues/159 :] Though we can definitely consider πŸ‘

jonschlinkert commented 5 years ago

Checked that one also glob seems to be more 'stable' after seeing this issue

Fwiw, it seems like some assumptions are being made. Hopefully, this will clear them up.

  1. Micromatch (used by fast-glob) has been downloaded 627,718,336 times, and was first released 4 years ago.
  2. Minimatch (used by node-glob) has been downloaded 823,201,530 times, and was first release 8 years ago.
  3. Micromatch passes nearly all of the bash unit tests. Minimatch fails more than 30% of them.
  4. Micromatch passes nearly all of the extglob unit tests. Minimatch fails more than 40% of them.
  5. Micromatch passes nearly all of the wildmatch (git wildcard matching) unit tests. Minimatch fails a lot of them.
  6. Micromatch passes the Korn shell, Kleene star and Kleene plus tests. Minimatch fails a bunch.
  7. Micromatch can handle regex, non-extglob parentheses, nested extglobs, and lots of other advanced syntaxes. Minimatch simply escapes them and can't handle them at all. If you get successful matches with these things, it's a coincidence.
  8. Micromatch has more than 30,000 unit tests. Minimatch has a couple hundred. If you do happen to find something that minimatch does correctly and micromatch doesn't, then that would be a rare exception to the rule.
  9. The issue you pointed to is, in fact, not an exception to the rule, as node-glob is incorrect.

Here is what you linked to. You can run these for yourself.

Minimatch (node-glob)

This is what minimatch returns for the following:

const minimatch = require('minimatch');
const minimatchRegex = minimatch.makeRe('{file.txt,directory/**}');
console.log('minimatch');

console.log(minimatchRegex.test('file.txt')); // true
console.log(minimatchRegex.test('directory/test.txt')); // true
console.log(minimatchRegex.test('directory/.test.txt')); // true (WRONG)
console.log(minimatchRegex.test('directory/dir/test.txt')); // true
console.log(minimatchRegex.test('directory/dir./test.txt')); // true (WRONG)

console.log(minimatchRegex.test('file.txt')); // true
console.log(minimatchRegex.test('directory/test.txt')); // true
console.log(minimatchRegex.test('directory/.test.txt')); // true (WRONG)
console.log(minimatchRegex.test('directory/dir/test.txt')); // true
console.log(minimatchRegex.test('directory/dir./test.txt')); // true (WRONG)

Micromatch (fast-glob)

Here is what micromatch returns.

const micromatchRegex = micromatch.makeRe('{file.txt,directory/**}');
console.log('micromatch');

console.log(micromatchRegex.test('file.txt')); // true
console.log(micromatchRegex.test('directory/test.txt')); // true
console.log(micromatchRegex.test('directory/.test.txt')); // false (CORRECT)
console.log(micromatchRegex.test('directory/dir/test.txt')); // true
console.log(micromatchRegex.test('directory/dir./test.txt')); // false (CORRECT)

console.log(micromatchRegex.test('file.txt')); // true
console.log(micromatchRegex.test('directory/test.txt')); // true
console.log(micromatchRegex.test('directory/.test.txt')); // false (CORRECT)
console.log(micromatchRegex.test('directory/dir/test.txt')); // true
console.log(micromatchRegex.test('directory/dir./test.txt')); // false (CORRECT)

At a glance, it seems to me that micromatch is correct on all of the tests. I might be wrong, and that's why we use bash and other very well-established conventions to help us agree on matching behavior (unfortunately there is no official specification for globs yet - but I'm working on one slowly).

Jeez, what's a lib gotta do to be considered stable? ;)

arcanis commented 5 years ago

@jonschlinkert Thanks for the explanation! I'd be interested to look into integrating fast-glob - the main problem I have (which is the same one as for node-glob) is that shell globing is slightly different from pure fs globbing πŸ€”

For instance foo/bar{.js,.mjs} would return two results even if those files don't actually exist. With fast-glob it doesn't work like this - if the foo folder doesn't exist we get an exception, and if it exists we just get no results.

Maybe we should do in two passes - first the brace expansion within the shell (since it isn't meant to access the filesystem), and then send the result to fast-glob.

jonschlinkert commented 5 years ago

For instance foo/bar{.js,.mjs} would return two results even if those files don't actually exist.

Can you clarify? Are you saying the expanded brace patterns themselves would be returned if no matching files exist? AFAIK, that behavior would need to be enabled in most shells, using something like nullglob option with shopt.

if the foo folder doesn't exist we get an exception, and if it exists we just get no results.

I'd like to get more information about what you're trying to accomplish, maybe I can help.

I very well may be missing something or need more info, but it seems like your problem could be resolved just by escaping the glob characters in the patterns defined in your config, either using backslashes or by wrapping them in quotes, so that they aren't "expanded" before they are passed to the globbing library.

For example:

{
  "scripts": {
    "grep": "grep -F \"$1\" packages/\\*/sources"
  }
}

Otherwise, what's probably happening is that your pattern, foo/bar.{mjs,js} is being expanded by the shell to [ 'foo/bar.js', 'foo/bar.mjs'], and those (non-glob) file paths are being passed to fast-glob. My hunch is that this is what's causing errors. Since no glob characters exist at that point, fast-glob is not able to find those exact matches. But again, just a hunch.

arcanis commented 5 years ago

Can you clarify? Are you saying the expanded brace patterns themselves would be returned if no matching files exist?

No, I mean that foo/bar{.js,.mjs} will always return foo/bar.js and foo/bar.mjs. Within shells (afaik) it's not a glob pattern, it's a syntax construct. It's useful for doing things like this:

$ mv foo/bar{.js,.mjs}

Or for things completely unrelated to file and directories:

$ echo "Hello "{Lucie,Tim,Alice,Bob}

I very well may be missing something or need more info, but it seems like your problem could be resolved just by escaping the glob characters in the patterns defined in your config, either using backslashes or by wrapping them in quotes, so that they aren't "expanded" before they are passed to the globbing library.

I haven't implemented it yet, I'm anticipating :p basically brace expansion is meant to be done before expanding the variables. So for example the following:

$ foo="a,b"
$ echo {$foo,$foo}

Is meant to become a,b and a,b rather than a, b, a, b. Writing it I realize it doesn't really matter if I do the variable expansion before, as long as I escape the glob characters in $foo as you mentioned πŸ€”

bertho-zero commented 4 years ago

Will it be possible later to make loops in a script?

"scripts": {
  "test": "for file in test/*.test.js; do node \"$file\"; done"
}
command not found: for
Internal Error: Unbound variable "file"

EDIT: done with

"scripts": {
  "test": "bash -c 'for file in test/*.test.js; do node \"$file\"; done'"
}
arcanis commented 4 years ago

Better use Bash or even a Node script; this is out of scope for our script language.

yacinehmito commented 4 years ago

I haven't implemented it yet, I'm anticipating :p basically brace expansion is meant to be done before expanding the variables.

Yup, and it seems pretty straightforward to do with braces (which is also in the micromatch family).

wlarch commented 8 months ago

For reference, I had this problem using the rimraf package for a yarn clean script. Our previous script in package.json was rimraf {lib,tsconfig.tsbuildinfo} and would throw the error mentioning that glob patterns are not implemented. It could be fixed with rimraf --glob '{dist,storybook-static,tsconfig.tsbuildinfo}'