kljensen / semiuniq

A uniq-like tool for removing nearby repeated lines in a file"
4 stars 0 forks source link

Add option to limit line size used for hash #2

Open kljensen opened 3 years ago

kljensen commented 3 years ago

For many applications, only the first N characters of the line are relevant and we needn't send more data to the hashing function. I should add a flag for specifying that N and also add a flag for setting a custom buffer size.

See also https://dev.to/mineichen/writing-a-better-line-iterator-in-rust-443m

kljensen commented 3 years ago

Also see https://crates.io/crates/bytelines

kljensen commented 3 years ago

And https://crates.io/crates/linereader

kljensen commented 3 years ago

And https://docs.rs/bstr/0.2.14/bstr/