wellbehavedsoftware / btrfs-dedupe

MIT License
32 stars 5 forks source link

Crash due to file name #7

Open cromulentbanana opened 4 years ago

cromulentbanana commented 4 years ago

While running, with the following invocation

btrfs-dedupe /mnt/foo/escher/

I encountered the following crash:

thread 'main' panicked at 'byte index 106 is not a char boundary; it is inside '·' (bytes 105..107) of `Scanning filesystem... "/mnt/foo/escher/snaphome-20160908_093512/Music/Annie Fischer/BBC Legends. Brahms · Bartok · Liszt · Dohnanyi/21 Trois Etudes de Concert, S144. No.3 Un Sospiro. Allegro affettuoso.m4a"`', src/libcore/str/mod.rs:2154:5
cromulentbanana commented 4 years ago

hi @jamespharaoh is this sufficient detail for you or would you like me to provide you with the file that caused this panic?

Thanks for your contributions to this tool!

jamespharaoh commented 4 years ago

Hi, sorry I haven't really been working on this for a long time. In any case, I can explain what the problem is:

To simplify development and the file format, I assume that all filenames are valid utf-8. This is very often the case, since it causes problems in all sorts of places, and most Linux systems make similar assumptions, or at least have the same policy.

To fix this, we'd need to treat filenames as OsString or binary, and encode them suitably for JSON storage in the state. I don't have any plan to implement this right now, but I'm happy to consider any contributions.

cromulentbanana commented 4 years ago

hi @jamespharaoh thank you for the insight. I'll take a look at the code and see whether I might take a crack at it.

Just to clarify one point: I'm confident that this filename is indeed valid utf-8 (see below). Did you mean that you assume all filenames are ascii?

❯❯ iconv -f utf-8 <(echo $(ls ~/Music/Annie\ Fischer/BBC\ Legends.\ Brahms\ ·\ Bartok\ ·\ Liszt\ ·\ Dohnanyi/21\ Trois\ Etudes\ de\ Concert,\ S144.\ No.3\ Un\ Sospiro.\ Allegro\ affettuoso.m4a)) -t utf-8
/home/dlevin/Music/Annie Fischer/BBC Legends. Brahms · Bartok · Liszt · Dohnanyi/21 Trois Etudes de Concert, S144. No.3 Un Sospiro. Allegro affettuoso.m4a

iconv confirms that this filename is valid utf-8