olson-sean-k / wax

Opinionated and portable globs that can be matched against paths and directory trees.
https://glob.guide
MIT License
115 stars 10 forks source link

Drive letter on Windows? #34

Open melMass opened 1 year ago

melMass commented 1 year ago

Hi,

I'm using nushell which relies on this crate for the glob feature. Unfortunately it fails to resolve windows drive letters, after reading wax's readme I guess it has to do with the repetition token?

Is there a way to escape these?

Here are a few non working samples (in nu): nushell/#7125

I'm a bit short on time lately but if you have pointers to solve this I can also propose a PR at some point

Thanks

olson-sean-k commented 1 year ago

Thanks for the information and context!

it fails to resolve windows drive letters

Wax does not support Windows path prefixes such as drive letters and other volume specifiers by design. For applications like Nushell's glob CLI, I recommend bridging the gap with a mechanism for specifying a tree using a native path. For example, glob could accept an option for this like so:

> glob --tree=\\server\share '**/*.txt'

An option like this would only be necessary when a platform-specific file system feature is needed (such as a Windows UNC path in this example).

virtualritz commented 1 year ago

This is really a bummer as wax IMHO is most useful for cross-platform CLI utilities on Windows. On macOS and Linux the shells most people use have some sort of globbing support anyway.

It would be great if at least UNC paths with forward slashes could be supported. I.e. //./C:/foo/bar/**.

bobhy commented 1 year ago

Late to the party, but I'm the most recent maintainer of glob in nushell and was in the process of extending other nushell file system commands to use wax when I ran into this.

The idea of manually splitting the pattern and path on the commandline might work for glob, which has only one pattern. But it would be pretty awkward for a command like cp <pat1> <pat2> ... <dest>.

So I'm looking at workarounds in nushell that might salvage the rooted pattern scenario. Here's what I'm playing around with right now: A bit of preprocessing the pattern specified by the user for windows only:

  1. replace all \ with /
  2. if pattern starts with <driveletter>:, escape it as <driveletter>\:

So a rooted glob like: glob C:\Users\<me>\test/**' becomesglob C\:/Users//test/**'` and works for me.
Windows already accepts forward slashes for UNC paths, I think this will work for UNC paths too.

It means user cannot quote a metacharacter with \, but that seems a "small" loss. User might be able to work around by making it into a one-character class, so test/\* could be specified as test/[*].

Interesting to note that the .NET globbing functions strictly require separating the root directory from the pattern: https://learn.microsoft.com/en-us/dotnet/core/extensions/file-globbing#get-all-matching-files. I'm guessing they couldn't come up with a more elegant solution...

I'm off to code up this workaround in nu and see how it goes...

olson-sean-k commented 1 year ago

Thanks for sharing, @bobhy! I'm really curious to see how this goes and appreciate seeing what you're working on! I have a few thoughts about this.

Wax glob expressions are designed to be as portable as possible. Windows path prefixes are fairly complex and definitely not portable, which is one of the main reasons that they are explicitly not supported.

the .NET globbing functions strictly require separating the root directory from the pattern

My inkling is that the developers of these APIs wanted to punt on some of the same issues I've thought about that can occur within Windows path prefixes. For example, what happens if a pattern occurs within a prefix? What does \\*\*\*.txt mean? Some parts of that pattern may be possible to implement, but there are many different error cases (I don't think \\* is actually possible). How do verbatim paths interact with patterns? What does \\.\** mean? Rejecting all of these may be a bit surprising, and I think mixing native paths with globbing patterns muddies the waters conceptually.

it would be pretty awkward for a command like cp <pat1> <pat2> ... <dest>

I agree! I'd also like to caution that mass file operations like this are tricky and dangerous. I factored Wax out of Nym, which attempts to do this kind of thing (it's very incomplete; I've been writing various libraries to improve it and haven't looped back to it yet). This is one of the main reasons that variance and exhaustiveness queries exist in Wax. Most users won't care about this at all, but it turns out that these sorts of properties are important for doing this safely and correctly and I've been spending a lot of time refactoring Wax to provide correct (or at least conservative) answers to these queries.

IMO, accepting multiple independent patterns like this should probably be avoided in basic commands. One way to do this is to remove globbing support from commands and instead rely solely on pipelines (as I suggested in a referencing Nushell bug). So this example becomes something more like glob <pat> | cp <dest>. That's a big departure from what most (all?) other shells do though (where globbing is often provided by the shell itself).

If multiple independent patterns with varying prefixes are a must, then I think preprocessing like this is a reasonable approach. I'd recommend a Nushell syntax that explicitly separates a native path prefix from the pattern, such as <path>|<pat>. In your example, we'd get something like C:\Users\<me>|test/**. These prefixed patterns could specify only a path, only a pattern, or both. I can even imagine the shell syntax highlighting this, so when a Unix user copies and pastes some command line from a Windows user on the Internet, they can immediately see and modify the platform-specific parts.

bobhy commented 1 year ago

One thing I've learned from this and the conversation back at https://github.com/nushell/nushell/issues/10498 is that globbing at the command line is different from globbing in "code", and probably needs to be. Wax seems comfortable positioned in the "coding" space -- a powerful tool that has some sharp edges. Nushell has an internal globbing library that works pretty well at the command line (for windows and other OSes). I can fix what was bugging me by extending that library to support '{}', and that could be the end of the story. But that would leave nushell with a custom glob library to maintain.

If you were interested in positioning wax in the command line arena, I could propose some extensions, and maybe implement them, too. Caveat, I haven't looked at the code at all, so I'm pretty much talking though my hat here. But with that in mind: I think the big game is to support native Windows paths including rust-designated "verbatim" paths without having to quote colons, backslashes and question marks.