marcusbuffett / pipe-rename

Rename your files using your favorite text editor
https://crates.io/crates/pipe-rename
MIT License
397 stars 12 forks source link

Crashing when editor closes, after editing a given line #55

Closed assarbad closed 2 years ago

assarbad commented 2 years ago

When on Ubuntu 20.04 with UTF-8 as code page I am seeing the following:

$ env RUST_BACKTRACE=1 renamer Wirecard\ -\ Die\ Milliarden-Lüge.mp4 Zölibat\ -\ Der\ katholische\ Leidensweg\ \[1080464
5\].mp4
thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside 'ö' (bytes 1..3) of Zölibat - Der katholische Leidensweg.mp4', library/core/src/str/mod.rs:127:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
   2: core::str::slice_error_fail_rt
   3: core::ops::function::FnOnce::call_once
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
   4: core::intrinsics::const_eval_select
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/intrinsics.rs:2372:5
   5: core::str::slice_error_fail
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/str/mod.rs:86:9
   6: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
   7: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   8: renamer::main
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

For the issue to surface I need to:

I am still collecting more information on this on my end, but I suspect some sort of normalization issue. This happens on an NTFS drive (which may play a role) with the commercial version of the Paragon NTFS driver (may also play a role). Could also have to do with NTFS as storage (which uses UTF-16 internally and may use other Unicode normalization rules).

I'll have to try to narrow this down further, but wanted to already bring this to your attention to the extent to which I have gathered information.

What would be great to see, however, would be the real bytes underneath. I mean I trust Rust to figure out if we end up in the middle of a surrogate or so, but since environmental factors like (potentially) normalization on part of the editor, the file system etc could play a role, it would be great to see input and output as raw bytes in case of this sort of panic.

assarbad commented 2 years ago

The following .tar file contains the file names that cause the error renamer-error.tar.gz.

This should hopefully ensure that the file names appear on the target system the same as on mine. All files have been truncated to zero size for convenience.

The issue appears to be around the second code point in the file name starting with Z.

After unpacking this .tar file on a n ext4 and btrfs partition, respectively, I am seeing the same issue with renamer, so it's not specific to NTFS or the ufsd (commercial) driver from Paragon.

For reference, this is the renaming I am attempting:

From

$ xxd -g 1 before-renaming.txt
00000000: 44 69 65 20 49 6e 76 65 73 74 6d 65 6e 74 62 61  Die Investmentba
00000010: 6e 6b 20 42 61 72 69 6e 67 73 20 42 61 6e 6b 20  nk Barings Bank
00000020: 5b 31 37 31 30 31 38 5f 62 61 72 69 6e 67 73 5f  [171018_barings_
00000030: 69 6e 66 5d 2e 6d 70 34 0a 44 69 65 20 49 6e 76  inf].mp4.Die Inv
00000040: 65 73 74 6d 65 6e 74 62 61 6e 6b 20 4c 65 68 6d  estmentbank Lehm
00000050: 61 6e 20 42 72 6f 74 68 65 72 73 20 5b 31 37 31  an Brothers [171
00000060: 30 31 38 5f 6c 65 68 6d 61 6e 5f 69 6e 66 5d 2e  018_lehman_inf].
00000070: 6d 70 34 0a 5a c3 b6 6c 69 62 61 74 20 2d 20 44  mp4.Z..libat - D
00000080: 65 72 20 6b 61 74 68 6f 6c 69 73 63 68 65 20 4c  er katholische L
00000090: 65 69 64 65 6e 73 77 65 67 20 5b 31 30 38 30 34  eidensweg [10804
000000a0: 36 34 35 5d 2e 6d 70 34 0a                       645].mp4.

To

$ xxd -g 1 after-renaming.txt
00000000: 44 69 65 20 49 6e 76 65 73 74 6d 65 6e 74 62 61  Die Investmentba
00000010: 6e 6b 20 42 61 72 69 6e 67 73 20 42 61 6e 6b 20  nk Barings Bank
00000020: 5b 31 37 31 30 31 38 5f 62 61 72 69 6e 67 73 5f  [171018_barings_
00000030: 69 6e 66 5d 2e 6d 70 34 0a 44 69 65 20 49 6e 76  inf].mp4.Die Inv
00000040: 65 73 74 6d 65 6e 74 62 61 6e 6b 20 4c 65 68 6d  estmentbank Lehm
00000050: 61 6e 20 42 72 6f 74 68 65 72 73 20 5b 31 37 31  an Brothers [171
00000060: 30 31 38 5f 6c 65 68 6d 61 6e 5f 69 6e 66 5d 2e  018_lehman_inf].
00000070: 6d 70 34 0a 5a c3 b6 6c 69 62 61 74 20 2d 20 44  mp4.Z..libat - D
00000080: 65 72 20 6b 61 74 68 6f 6c 69 73 63 68 65 20 4c  er katholische L
00000090: 65 69 64 65 6e 73 77 65 67 2e 6d 70 34 0a        eidensweg.mp4.

The respective part (Zölibat) seems not to change at all, yet it's the part renamer complains about:

As far as I could ascertain the c3 b6 translates fine to ö (in HTML &ouml;).

assarbad commented 2 years ago

Found it. rust-gdb was a real help.

Okay, so the problem was introduced by #48. I am still trying to reason about it, but the author - @mtimkovich - is probably better equipped to help out here. It looks to me as if the code makes undue assumptions about what the (renamed) string ought to look like.


The underlying issue

Preparations

  1. Update to master (currently at 56055f4271b2571e6928109dd9c4bfd0fd7c2526)
  2. Append the following lines to Cargo.toml (especially [profile.dev] is relevant)
  3. Run cargo build
  4. Go into the folder where you unpacked renamer-error.tgz
  5. Run env RUST_BACKTRACE=1 rust-gdb -ex "b Rename::new" -ex run --args /path/to/target/debug/renamer *
  6. Edit interactively and save and quit
  7. Use n (for next) when back at the GDB prompt (note: after issuing a command once, you can hit Enter and it will get repeated, so instead of n plus Enter again you can simply hit Enter to run next again)

Lines to append to Cargo.toml

# xref: https://bitshifter.github.io/rr+rust/#11
# The development profile, used for `cargo build`
[profile.dev]
opt-level = 0  # Controls the --opt-level the compiler builds with
debug = true   # Controls whether the compiler passes `-g`

# The release profile, used for `cargo build --release`
[profile.release]
opt-level = 3
debug = false

Observing the panic

I am cutting out some redundant text output for brevity. Replaced lines have been substituted by a [...]:

$ env RUST_BACKTRACE=1 rust-gdb --args ../target/debug/renamer *
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
[...]
Reading symbols from ../target/debug/renamer...
[...]
Breakpoint 1 at 0x4f908: file src/main.rs, line 55.
[...]
Starting program: /home/oliver/pipe-rename/target/debug/renamer Die\ Investmentbank\ Barings\ Bank\ \[171018_barings_inf\].mp4 Die\ Investmentbank\ Lehman\ Brothers\ \[171018_lehman_inf\].mp4
Zölibat\ -\ Der\ katholische\ Leidensweg\ \[10804645\].mp4
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after vfork from child process 34368]

Breakpoint 1, renamer::Rename::new (original="Zölibat - Der katholische Leidensweg [10804645].mp4", new="Zölibat - Der katholische Leidensweg.mp4") at src/main.rs:55
55              let mut new = new.to_string();
(gdb) n
56              if let Ok(home) = env::var("HOME") {
(gdb)
57                  if &new[..2] == "~/" {
(gdb)
thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside 'ö' (bytes 1..3) of `Zölibat - Der katholische Leidensweg.mp4`', library/core/src/str/mod.rs:127:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
   2: core::str::slice_error_fail_rt
   3: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
   4: core::intrinsics::const_eval_select
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/intrinsics.rs:2696:5
   5: core::str::slice_error_fail
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/str/mod.rs:86:9
   6: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeTo<usize>>::index
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/str/traits.rs:294:21
   7: core::str::traits::<impl core::ops::index::Index<I> for str>::index
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/str/traits.rs:65:9
   8: <alloc::string::String as core::ops::index::Index<core::ops::range::RangeTo<usize>>>::index
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/string.rs:2324:10
   9: renamer::Rename::new
             at /home/oliver/pipe-rename/src/main.rs:57:17
  10: renamer::find_renames::{{closure}}
             at /home/oliver/pipe-rename/src/main.rs:147:22
  11: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:290:13
  12: core::iter::traits::iterator::Iterator::find_map::check::{{closure}}
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:2732:32
  13: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:2238:21
  14: core::iter::traits::iterator::Iterator::find_map
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:2738:9
  15: <core::iter::adapters::filter_map::FilterMap<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/adapters/filter_map.rs:61:9
  16: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
  17: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/spec_from_iter.rs:33:9
  18: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/mod.rs:2648:9
  19: core::iter::traits::iterator::Iterator::collect
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:1836:9
  20: renamer::find_renames
             at /home/oliver/pipe-rename/src/main.rs:140:27
  21: renamer::main
             at /home/oliver/pipe-rename/src/main.rs:351:28
  22: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[Inferior 1 (process 34363) exited with code 0145]

What this shows us is that this is the problematic line.

In particular the issue seems to be that the slice created is a slice of bytes and therefore attempts to cut a Unicode codepoint right in its middle, which makes the underlying code barf.

example.tar.gz (Rust Playgound link%3B%0A%7D%0A%0Afn%20main()%20%7B%0A%20%20%20%20let%20new%3A%20%26str%20%3D%20%22Z%C3%B6libat%22%3B%20%2F%2F%20input%20as%20per%20line%2053%0A%20%20%20%20print_type_of(%26%22new%22%2C%20%26new)%3B%0A%20%20%20%20let%20new%20%3D%20new.to_string()%3B%20%2F%2F%20corresponds%20to%20line%2055%20(just%20without%20mut)%0A%20%20%20%20print_type_of(%26%22mut%20new%22%2C%20%26new)%3B%0A%20%20%20%20print_type_of(%26%22'~%2F'%22%2C%20%26%22~%2F%22)%3B%0A%20%20%20%20let%20first_two%20%3D%20%26new%5B..2%5D%3B%20%2F%2F%20the%20problematic%20part%20of%20line%2057%20(!!!)%0A%20%20%20%20print_type_of(%26%22first_two%22%2C%20%26first_two)%3B%0A%20%20%20%20println!(%22%7B%7D%22%2C%20new)%3B%0A%7D%0A)) shows the issue a little better. Perhaps String::from_utf8() could be used to make the types in that condition on line 57 more compatible without having to care too much about what exactly is in the string at runtime?!

fn print_type_of<T>(name: &str, _: &T) {
    println!("{}: {}", name, std::any::type_name::<T>());
}

fn main() {
    let new: &str = "Zölibat"; // input as per line 53
    print_type_of(&"new", &new);
    let new = new.to_string(); // corresponds to line 55 (just without mut)
    print_type_of(&"mut new", &new);
    print_type_of(&"'~/'", &"~/");
    let first_two = &new[..2]; // the problematic part of line 57 (!!!)
    print_type_of(&"first_two", &first_two);
    println!("{}", new);
}

Debug with rust-gdb -ex 'b main.rs:11' -ex run --args ./target/debug/teststr

This answer appears to explain it even better and more succinctly than I ever could.

assarbad commented 2 years ago

The answer appears to be to use if new.starts_with("~/") { instead of that slice of bytes.

Rust Playground link%3B%0A%7D%0A%0Afn%20main()%20%7B%0A%20%20%20%20let%20new%3A%20%26str%20%3D%20%22Z%C3%B6libat%22%3B%20%2F%2F%20input%20as%20per%20line%2053%0A%20%20%20%20print_type_of(%26%22new%22%2C%20%26new)%3B%0A%20%20%20%20let%20mut%20new%20%3D%20new.to_string()%3B%20%2F%2F%20corresponds%20to%20line%2055%0A%20%20%20%20print_type_of(%26%22mut%20new%22%2C%20%26new)%3B%0A%20%20%20%20print_type_of(%26%22'~%2F'%22%2C%20%26%22~%2F%22)%3B%0A%20%20%20%20if%20new.starts_with(%22~%2F%22)%20%7B%0A%20%20%20%20%20%20%20%20new%20%3D%20new.replacen(%22~%22%2C%20%22whatever%22%2C%201)%3B%0A%20%20%20%20%7D%0A%20%20%20%20println!(%22%7B%7D%22%2C%20new)%0A%7D%0A)

I think the tests could use some more wacky codepoints, too. Be it German umlauts, Icelandic characters (Þ,Ð ...), Cyrillic or Chinese characters 😉 ... just thinking out loud here.

mtimkovich commented 2 years ago

Good idea with getting weirder with the test cases.