olson-sean-k / wax

Opinionated and portable globs that can be matched against paths and directory trees.
https://glob.guide
MIT License
112 stars 10 forks source link

is case-insensitive the default now with 0.6.0? #48

Open fdncred opened 11 months ago

fdncred commented 11 months ago

It seems like case sensitivity has changed between 0.5.0 and 0.6.0. I'm fairly confident that before in nushell, using our glob command that uses wax globbing, we could do glob "c*" and get only the lowercase c matches, but now we get all upper and lower case matches. In fact, I have a special example in the glob command's help text that shows how to make a case-insensitive match by doing glob '(?i)c*'. Just wondering if I missed some updates that explain this or if this is a bug. Thanks!

I'm testing on windows, if that's helpful. It looks like case-sensitivity may be turned off for windows paths. I'm wondering if that's the issue. BTW - Windows can have case-sensitive paths but it's pretty rare.

olson-sean-k commented 11 months ago

There has been no (expected!) change regarding casing in version 0.6.0.

I'm testing on windows, if that's helpful.

If you were previously experimenting on a Unix platform, then that likely explains the difference in behavior. From the README:

By default, glob expressions use the same case sensitivity as the target platforms's file system APIs (case-sensitive on Unix and case-insensitive on Windows), but i can be used to toggle this explicitly as needed.

This is the one example of platform-specific behavior in glob expressions. It behaves a bit more intuitively and also means that literals are invariant by default. For example, on Windows the expression literal is invariant (i.e., resolves no differently than the native path literal), but the expression (?-i)literal is variant (i.e., behaves differently than the native path literal).

Windows paths may be resolved in a case-sensitive manner, but I think this is much less common and I believe it requires using prefixed paths (with suitable APIs) or using low level interfaces (the file system has no notion of casing).

fdncred commented 11 months ago

Thanks for your answers. I think you're right that I was checking Windows one day and Linux/MacOS on another.

Any windows folder can be configured to be case sensitive like this below, but I wouldn't recommend anyone really do it. It's cool to play around with one folder but if one was to do their entire drive this way, things would probably stop working.

fsutil.exe file SetCaseSensitiveInfo C:\folder\path enable

or recursively like this from powershell.

(Get-ChildItem -Recurse -Directory).FullName | ForEach-Object {fsutil.exe file setCaseSensitiveInfo $_ enable}

For our use case in nushell, I'd prefer consistency over platform specific behavior. It just helps with script portability. I'm just not sure how to code around this now to have similar behavior on Mac/Linux/Windows.

In my example today, I was doing glob 'c*' to get all lowercase items that being with c. I'm not sure how to do that in Windows now since it returns all lower and upper. Is there a glob that on Windows says return only the things with the case I specify?

olson-sean-k commented 11 months ago

Any windows folder can be configured to be case sensitive

Oof, I wasn't aware of this. Thanks for bringing it to my attention! It looks like this was added in Windows 10 Build 17093 back in 2018 for WSL. This means that textual variance on Windows can only be determined semantically with respect to a path and the attributes of all directory components in that path. And it is per-component. Brutal. Said another way, there is no reliable notion of a logical pattern match on Windows. (I believe this was technically already the case prior to 17093, but I think these file attributes are both more likely to matter and can actually be handled, unlike choices about which Windows file system APIs to standardize on.)

I think this means that to properly support Windows and platforms like it, Pattern cannot claim to be strictly logical and instead CandidatePaths must decide. On Unix, CandidatePath can remain largely the same as it is today, but on Windows its representation must support per-component casing. In the public API, this probably means that CandidatePath provides more explicit construction that allows for this per-component casing somehow (either from the ether as a somewhat logical construction or from reading from the file system as a semantic construction). In its most simple form, this probably looks like CandidatePath::logical and CandidatePath::semantic constructors, but I think providing a way to explicitly provide component metadata (without necessarily reading from the file system) is important.

For our use case in nushell, I'd prefer consistency over platform specific behavior.

I think this applies to globs in wax too. A default case sensitivity in glob expressions that is indifferent to platform is probably the better choice. Perhaps CandidatePaths should have platform-specific casing defaults in the most straightforward API usages instead (rather than glob expressions using different defaults).

I have a lot of thoughts right now on poor interactions between case sensitivity (and these file attributes), semantic literals, and partitioning that I haven't mentioned here yet. I'll mention briefly though that I think recommending partitioning to semantically interpret components like .. interacts with this, isn't a very cross-platform approach, and just isn't great either.

fdncred commented 11 months ago

Thanks @olson-sean-k for the conversation and willingness to listen to me whine about things. 😆

Each time I respond here, I have to do a dozen web searches to see if what I'm remembering is actually true or something I'm making up. LOL. I think your 17093 link is right but it looks to me like the issue goes back to Windows 95. I believe with Windows 95 Microsoft chose to make the file system case preserving, which is why on Windows you can have file.CSV and file.csv and some apis won't see the capitalized one. You can see a little bit above the highlighted text here.

Of course, no one runs Windows 95 anymore, but I'd be willing to bet that this "case preserving" functionality is still in Windows 11.

I'm anxious to see what comes of this. Let me know if you need us to test something that you come up with.