Open remram44 opened 5 years ago
I'm not sure there is anyone actively working on this crate, and fixing this bug probably requires non-trivial changes. You might consider using globset
instead.
Oh, sorry about the noise :sweat_smile: It showed up in the "GitHub Explore" mailing-list "based on your public repository contributions".
Thanks for the pointer.
globset seems to have the same problem, Glob::new()
takes a string. This works fine for Americans and on UTF-8 machines but make it impossible to pass valid Path
values to it.
[edit: and globset doesn't have a dedicated issue tracker]
No, globset does not suffer from the same problem. globset does require valid UTF-8 for the pattern, but it's matching function can accept any path.
[edit: and globset doesn't have a dedicated issue tracker]
It might some day, but you can just file bugs against ripgrep with globset mentioned in the issue.
If you have a use case for specifying a pattern that isn't valid UTF-8, then I would certainly be happy to have a bug for that, but please provide a real example, because it's not actually clear what the intended semantics should be.
Also, the focus on "Americans" is completely unnecessary. UTF-8 isn't specific to Americans.
On a ISO-8859-15 system, I might want to match Glob::new(b"d\xE9cembre 2018 - *.jpg")
. Or I might want to pass arguments from args_os()
(without panicking on non-UTF-8 filenames) or clap.
I might also just want to build a pattern from an existing Path
, for example to find rsync-style temp files (.<filename>.randomstring
e.g. .cv_rémi.pdf.GV4H3
). Path.file_name()
correctly returns &OsStr
which I can easily manipulate, but I can't feed it back to Glob::new()
.
Thanks for the example. I filed an issue and turned your example into code: https://github.com/BurntSushi/ripgrep/issues/1250
Paths on UNIX are bytes, not unicode. Using
str
means that all of them can't be represented.In particular, on systems set with a different locale, or when using disks that were created in a different locale, or simply because of some glitch somewhere, files using non-UTF-8 byte sequence can totally exist.
Not only is it not that much of a corner-case, but the standard library already dealt with that by providing the
Path
type.AsRef<Path>
can also be used to allow users to still pass in string slices and string literals.Related to #23 which is not about the interface but a bug in the internals.