Open melMass opened 1 year ago
Thanks for the information and context!
it fails to resolve windows drive letters
Wax does not support Windows path prefixes such as drive letters and other volume specifiers by design. For applications like Nushell's glob
CLI, I recommend bridging the gap with a mechanism for specifying a tree using a native path. For example, glob
could accept an option for this like so:
> glob --tree=\\server\share '**/*.txt'
An option like this would only be necessary when a platform-specific file system feature is needed (such as a Windows UNC path in this example).
This is really a bummer as wax
IMHO is most useful for cross-platform CLI utilities on Windows. On macOS and Linux the shells most people use have some sort of globbing support anyway.
It would be great if at least UNC paths with forward slashes could be supported. I.e. //./C:/foo/bar/**
.
Late to the party, but I'm the most recent maintainer of glob
in nushell and was in the process of extending other nushell file system commands to use wax
when I ran into this.
The idea of manually splitting the pattern and path on the commandline might work for glob
, which has only one pattern.
But it would be pretty awkward for a command like cp <pat1> <pat2> ... <dest>
.
So I'm looking at workarounds in nushell that might salvage the rooted pattern scenario. Here's what I'm playing around with right now: A bit of preprocessing the pattern specified by the user for windows only:
\
with /
<driveletter>:
, escape it as <driveletter>\:
So a rooted glob like: glob C:\Users\<me>\test/**' becomes
glob C\:/Users/
Windows already accepts forward slashes for UNC paths, I think this will work for UNC paths too.
It means user cannot quote a metacharacter with \
, but that seems a "small" loss. User might be able to work around by
making it into a one-character class, so test/\*
could be specified as test/[*]
.
Interesting to note that the .NET globbing functions strictly require separating the root directory from the pattern: https://learn.microsoft.com/en-us/dotnet/core/extensions/file-globbing#get-all-matching-files. I'm guessing they couldn't come up with a more elegant solution...
I'm off to code up this workaround in nu and see how it goes...
Thanks for sharing, @bobhy! I'm really curious to see how this goes and appreciate seeing what you're working on! I have a few thoughts about this.
Wax glob expressions are designed to be as portable as possible. Windows path prefixes are fairly complex and definitely not portable, which is one of the main reasons that they are explicitly not supported.
the .NET globbing functions strictly require separating the root directory from the pattern
My inkling is that the developers of these APIs wanted to punt on some of the same issues I've thought about that can occur within Windows path prefixes. For example, what happens if a pattern occurs within a prefix? What does \\*\*\*.txt
mean? Some parts of that pattern may be possible to implement, but there are many different error cases (I don't think \\*
is actually possible). How do verbatim paths interact with patterns? What does \\.\**
mean? Rejecting all of these may be a bit surprising, and I think mixing native paths with globbing patterns muddies the waters conceptually.
it would be pretty awkward for a command like
cp <pat1> <pat2> ... <dest>
I agree! I'd also like to caution that mass file operations like this are tricky and dangerous. I factored Wax out of Nym, which attempts to do this kind of thing (it's very incomplete; I've been writing various libraries to improve it and haven't looped back to it yet). This is one of the main reasons that variance and exhaustiveness queries exist in Wax. Most users won't care about this at all, but it turns out that these sorts of properties are important for doing this safely and correctly and I've been spending a lot of time refactoring Wax to provide correct (or at least conservative) answers to these queries.
IMO, accepting multiple independent patterns like this should probably be avoided in basic commands. One way to do this is to remove globbing support from commands and instead rely solely on pipelines (as I suggested in a referencing Nushell bug). So this example becomes something more like glob <pat> | cp <dest>
. That's a big departure from what most (all?) other shells do though (where globbing is often provided by the shell itself).
If multiple independent patterns with varying prefixes are a must, then I think preprocessing like this is a reasonable approach. I'd recommend a Nushell syntax that explicitly separates a native path prefix from the pattern, such as <path>|<pat>
. In your example, we'd get something like C:\Users\<me>|test/**
. These prefixed patterns could specify only a path, only a pattern, or both. I can even imagine the shell syntax highlighting this, so when a Unix user copies and pastes some command line from a Windows user on the Internet, they can immediately see and modify the platform-specific parts.
One thing I've learned from this and the conversation back at https://github.com/nushell/nushell/issues/10498 is that globbing at the command line is different from globbing in "code", and probably needs to be.
Wax
seems comfortable positioned in the "coding" space -- a powerful tool that has some sharp edges. Nushell has an internal globbing library that works pretty well at the command line (for windows and other OSes). I can fix what was bugging me by extending that library to support '{}', and that could be the end of the story.
But that would leave nushell with a custom glob library to maintain.
If you were interested in positioning wax
in the command line arena, I could propose some extensions, and maybe implement them, too. Caveat, I haven't looked at the code at all, so I'm pretty much talking though my hat here. But with that in mind:
I think the big game is to support native Windows paths including rust-designated "verbatim" paths without having to quote colons, backslashes and question marks.
wax::Glob
, consider implementing as an alternate pattern with all the same methods: e.g, ArgGlob
.:
as a literal except within <>
. This wouldn't break any currently correct wax
patterns, so it's arguably a benign fix. (
as a literal when not followed by ?<option>)'. (Needed for
C:\Program Files (x86)`). This also doesn't invalidate currently correct patterns, but it does approach the edge of accepting a nonsense pattern with a typo and doing something unexpected. //?/
"verbatim" path (why couldn't rust devs just call it "windows extended path", like the rest of the world?). wax
could accept ?
as a literal unless within metacharacter braces or adjacent to an active metacharacter. This does break some currently correct patterns. But wax's own doc says a standalone '?' is rare -- it gains power in combination with other pattern contexts, so maybe this is no big loss? Or maybe only accept ?
as literal if pattern starts with \?(or
//?`)?.\
as a literal, except within metachar brackets: []
, <>
and {}
. This is a breaking change, User who needs to quote a metachar outside brackets would use [c]
instead. All I can say in defense is that this is the rule nushell's internal glob uses and it's well-accepted within that user community.
Hi,
I'm using nushell which relies on this crate for the glob feature. Unfortunately it fails to resolve windows drive letters, after reading wax's readme I guess it has to do with the
repetition
token?Is there a way to escape these?
Here are a few non working samples (in nu): nushell/#7125
I'm a bit short on time lately but if you have pointers to solve this I can also propose a PR at some point
Thanks