srithon / unicode-prettytable

Rust table formatting library using Unicode Box-drawing characters.
2 stars 1 forks source link

Unicode characters cause a panic #2

Open euank opened 2 years ago

euank commented 2 years ago

The following code panics currently:

use unicode_prettytable::TableBuilder;

#[test]
fn test_cases() {
    let rows =  vec![vec!["πŸ’©πŸ’©πŸ’©πŸ’©", "πŸ•΄οΈπŸ•΄οΈπŸ•΄οΈπŸ•΄οΈ"]];
    println!("{}", TableBuilder::default().rows(&rows).build().unwrap());
}

The output with backtrace is:

thread 'test_cases' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 63, error_len: Some(1) }', unicode_prettytable/src/util.rs:81:61
stack backtrace:
   0: rust_begin_unwind
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/std/src/panicking.rs:516:5
   1: core::panicking::panic_fmt
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/core/src/panicking.rs:93:14
   2: core::result::unwrap_failed
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/core/src/result.rs:1599:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/core/src/result.rs:1281:23
   4: <unicode_prettytable::util::StringBuffer as core::fmt::Display>::fmt
             at ./src/util.rs:81:13
   5: <T as alloc::string::ToString>::to_string
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/alloc/src/string.rs:2379:9
   6: <unicode_prettytable::table::Table<T> as core::fmt::Display>::fmt
             at ./src/table.rs:293:25
   7: core::fmt::write
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/core/src/fmt/mod.rs:1117:17
   8: std::io::Write::write_fmt
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/std/src/io/mod.rs:1667:15
   9: std::io::stdio::print_to::{{closure}}::{{closure}}
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/std/src/io/stdio.rs:1183:25
  10: core::option::Option<T>::map
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/core/src/option.rs:823:29
  11: std::io::stdio::print_to::{{closure}}
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/std/src/io/stdio.rs:1182:13
  12: std::thread::local::LocalKey<T>::try_with
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/std/src/thread/local.rs:399:16
  13: std::io::stdio::print_to
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/std/src/io/stdio.rs:1178:12
  14: std::io::stdio::_print
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/std/src/io/stdio.rs:1205:5
  15: integ::test_cases
             at ./tests/integ.rs:6:5
  16: integ::test_cases::{{closure}}
             at ./tests/integ.rs:4:1
  17: core::ops::function::FnOnce::call_once
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/core/src/ops/function.rs:227:5
  18: core::ops::function::FnOnce::call_once
             at /rustc/492723897e9b4db6701b3a75b72618d08a7d5319/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test test_cases ... FAILED

Removing the util StringBuffer stuff was enough to fix it for me. Specifically, I fixed it over in this commit: https://github.com/euank/unicode-prettytable/commit/ef7039b4e1e19ab4da1bb7aed86721e974e77ef7

If you want I can make a PR with that; I wasn't sure if you might want a different fix or such.

srithon commented 2 years ago

Hey @euank, thanks for putting in the effort to file this issue. This was partially a toy project to get a library crate on crates.io, and as a result the code wasn't particularly robust. All of the StringBuffer code was made for the case I originally wanted to use this for, which was putting regular ASCII text from a timetable in pretty-ish Unicode tables. I wanted to fill the buffer with spaces to begin with so that blank spaces wouldn't need to be overwritten, and then effectively "fill in" the content. Methods like push_bytes_fixed_width were made to work with table fields with a fixed width; rather than writing a bunch of spaces, it would just write the content and then skip the index forward to the next field. In retrospect, I should have justified the existence of StringBuffer to avoid this kind of confusion. When the text is centered, that functionality is used improperly (see line 296). Rather than writing only what is necessary, a new String is created with the spaces already built-in, which means doing a copy/allocation unnecessarily. This should ideally be addressed at some point.

Your code doesn't use an equivalent of push_bytes_fixed_width, and so the following example doesn't run properly.

euank/unicode_prettytable on ξ‚  HEAD (ef7039b) is πŸ“¦ v0.3.1 via πŸ¦€ v1.56.0 took 16ms
at 10:22 am ❯ cargo run --example separate_header
   Compiling unicode-prettytable v0.3.1 (/home/sridaran/Development/Rust/SelfPublished/unicode-prettytable/forks/euank/unicode_prettytable)
    Finished dev [unoptimized + debuginfo] target(s) in 1.32s
     Running `/home/sridaran/Development/Rust/SelfPublished/unicode-prettytable/forks/euank/target/debug/examples/separate_header`
╒══════════╀═══════╀═══════════════╀═══════════════╕
β”‚   name   β”‚  age  β”‚  height (in)  β”‚  weight (lb)  β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ═══════β•ͺ═══════════════β•ͺ═══════════════║
β”‚Daveβ”‚14β”‚72β”‚164β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Joshuaβ”‚24β”‚65β”‚141β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Stacieβ”‚22β”‚61β”‚124β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Catherineβ”‚30β”‚68β”‚130β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Jackβ”‚8β”‚58β”‚78β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

You could remedy this by creating a right-padded String when centered_text is false (line 249 in Table), but in my opinion this is needlessly inefficient.

I later found more featureful crates like comfy-table and prettytable-rs, both of which (I assume) have support for Unicode. I switched over my motivating project from using this crate to comfy-table, which offered a lot more customizations than what I put in this crate. In its current state, this crate offers comparatively efficient table creation.

I think one other approach to fixing Unicode is to allow the underlying StringBuffer to resize, which would mean changing the implementation to use Vec methods instead of raw write. Because there are significantly more customizable alternatives, I'd like to keep this crate as fast as possible so that it can serve some sort of purpose. If you have any questions, feel free to ask me.

euank commented 2 years ago

Thanks for the detailed response, @srithon! Indeed, you're right that I misunderstood how padding was happening and got that wrong.

I also appreciate suggestion of alternative creates; I'll probably switch over to one of those.

That solves my immediate problem, but I guess it's worth figuring out if there's something else to do. I ended up here because this crate showed up more easily in whatever searches I did than comfy-table or prettytable.

I'll suggest that we have the following options:

  1. Add a readme and a note to the readme which clarifies something like "this crate is fast and only supports ascii, you may be interested in these other projects too"
  2. Fix this crate's unicode support properly (which I'm not convinced I'll find time to do, but maybe?)
  3. Do nothing

There might be other reasonable options, but I feel like those are the logical next steps from here. Do you have a strong preference for any of those options?

srithon commented 2 years ago

Thanks for the points, @euank! I think the way to go is 1, and then if we ever get time to do so, address 2. It would be nice to have some benchmarks on how fast the crate actually is, but in the process of writing the benchmarks I saw that the required input of a Vec<Vec<T>> rather than just an Iterator<Item=Iterator<Item=T>> was a pain to work with. I definitely thought of this while writing the code originally, but for some reason didn't go through with it. 1 can certainly be done without the benchmarks, but they would definitely be nice to have.

When I get some time, I'll work on adding a README to the project root. Thanks for the feedback!