Store each path as a Path instead of a String (which is utf8-only)

bew commented 3 years ago

Comment on: https://github.com/marcusbuffett/pipe-rename/blob/5658581f1f3b62ef086a18931d41999c09ee93ed/src/main.rs#L168-L175

I'm thinking, the filter_map exists only because the conversion from an OsString to a String could fail if the path is not UTF-8 and we might get an Err(...) instead of an Ok(String).

Edit: the filter_map also exists because each entry in from read_dir can be an error, my bad. My observation is still valid though and is on: .into_string().ok() which will return None for a non-utf8 path, and discard it.

I don't think that's right, the tool shouldn't skip non-utf8 paths.

@marcusbuffett I'm wondering if we could change something in the entire codebase: --> Instead of storing each path in a String (utf8), we could simply keep it as a Path (which can contain non-utf8). For the occasional cases where we need to print a path (like for the diff), we could use a replacement char for the non-utf8 chars. With a note for the user at the end of the diff if at least one path is non-utf8, the detection / change could be done before the diff to not over complicate it.

NOTE: I'd be interested to take a stab at this if you're ok with this

marcusbuffett commented 3 years ago

@bew i was thinking along the same lines when I updated clap, since you can parse arguments as PathBuf instead of string, but then I ended up having to unwrap stuff to match everything else. I agree we should be working with PathBufs when possible, would love for you to take a stab at it

marcusbuffett commented 3 years ago

Or OsString or whatever, not totally sure, I leave it up to you

mtimkovich commented 3 years ago

That OsString documentation scared me. I'm curious how vim deals with the issue.

On Fri, Apr 30, 2021, 5:56 PM Marcus Buffett @.***> wrote:

Or OsString or whatever, not totally sure, I leave it up to you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/marcusbuffett/pipe-rename/issues/30#issuecomment-830465185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE66RMWEZAGOLQFRU3QTJLTLM7TLANCNFSM435NM3AQ .

assarbad commented 2 years ago

Just thinking out loud: wouldn't a change like this:

cause issues across the Windows/non-Windows boundary because of OsString on Windows being 16-bit code units and 8-bit on other systems? OsString to String and back should be lossless for Windows. For other systems this would only be guaranteed for UTF(-8)-based locales, though; i.e. $LANG.
move the onus of dealing with the encoding to the invoked command/editor (which may be hard to convey in a generic fashion for anything that's not UTF-8)?

marcusbuffett / pipe-rename

Store each path as a Path instead of a String (which is utf8-only) #30