tafia / quick-csv

Quick rust csv reader
MIT License
46 stars 7 forks source link

Parsing error with files with BOM and quotes #4

Closed msiemens closed 8 years ago

msiemens commented 8 years ago

When trying to parse a file which is UTF-8 encoded, contains a Byte Order Mark (\xef\xbb\xbf\) and the entries are quoted, quick-csv fails with a UnexpextedQuote error.

Repro:

#[test]
fn utf8_bom_quotes() {
    let mut d = Csv::from_reader(&b"\xef\xbb\xbf\"abc\",\"xyz\""[..]);
    let r = d.next().unwrap().unwrap();
    let c = r.bytes_columns().collect::<Vec<_>>();
    assert_eq!(c, vec![b"abc", b"xyz"]);
}

Error:

---- test::utf8_bom_quotes stdout ----
        thread 'test::utf8_bom_quotes' panicked at 'called `Result::unwrap()` on an `Err` value: UnexpextedQuote', ../src/libcore\result.rs:788
stack backtrace:
   0:        0x1400126ac - std::rt::lang_start::h162055cb2e4b9fe7
   1:        0x140011c20 - std::rt::lang_start::h162055cb2e4b9fe7
   2:        0x140003d1d - std::panicking::rust_panic_with_hook::hd7b83626099d3416
   3:        0x140014ceb - rust_begin_unwind
   4:        0x140004cff - std::panicking::begin_panic_fmt::h30280d4dd3f149f5
   5:        0x14001492c - rust_begin_unwind
   6:        0x1400176a5 - core::panicking::panic_fmt::h2d3cc8234dde51b4
   7:        0x13ff7e9ba - unwrap_failed<quick_csv::error::Error>
                        at C:\bot\slave\stable-dist-rustc-win-msvc-64\build\src\libcore\macros.rs:29
   8:        0x13ff94ad4 - unwrap<quick_csv::Row,quick_csv::error::Error>
                        at C:\bot\slave\stable-dist-rustc-win-msvc-64\build\src\libcore\result.rs:726
   9:        0x13ffbc61b - utf8_bom_quotes
                        at C:\Users\markus\Documents\Coding\rust\quick-csv\src\test.rs:218
  10:        0x13ffe27da - test::stats::Summary::new::h8ad295300b5787a1
  11:        0x13ffe5642 - test::stats::Summary::new::h8ad295300b5787a1
  12:        0x140014d81 - _rust_maybe_catch_panic
  13:        0x13ffe5a8c - test::stats::Summary::new::h8ad295300b5787a1
  14:        0x140010373 - std::sys::thread::Thread::new::hf784252a6356c9b9
  15:         0x76c059cc - BaseThreadInitThunk
tafia commented 8 years ago

Thanks for reporting the issue! I won't have time to work on this before next week. If you have a fix I'll be happy to merge.

On 24 Sep 2016 8:14 p.m., "Markus Siemens" notifications@github.com wrote:

When trying to parse a file which is UTF-8 encoded, contains a Byte Order Mark (\xef\xbb\xbf) and the entries are quoted, quick-csv fails with a UnexpextedQuote error.

Repro:

[test]fn utf8_bom_quotes() {

let mut d = Csv::from_reader(&b"\xef\xbb\xbf\"abc\",\"xyz\""[..]);
let r = d.next().unwrap().unwrap();
let c = r.bytes_columns().collect::<Vec<_>>();
assert_eq!(c, vec![b"abc", b"xyz"]);

}

Error:

---- test::utf8_bom_quotes stdout ---- thread 'test::utf8_bom_quotes' panicked at 'called Result::unwrap() on an Err value: UnexpextedQuote', ../src/libcore\result.rs:788 stack backtrace: 0: 0x1400126ac - std::rt::lang_start::h162055cb2e4b9fe7 1: 0x140011c20 - std::rt::lang_start::h162055cb2e4b9fe7 2: 0x140003d1d - std::panicking::rust_panic_with_hook::hd7b83626099d3416 3: 0x140014ceb - rust_begin_unwind 4: 0x140004cff - std::panicking::begin_panic_fmt::h30280d4dd3f149f5 5: 0x14001492c - rust_begin_unwind 6: 0x1400176a5 - core::panicking::panic_fmt::h2d3cc8234dde51b4 7: 0x13ff7e9ba - unwrap_failed at C:\bot\slave\stable-dist-rustc-win-msvc-64\build\src\libcore\macros.rs:29 8: 0x13ff94ad4 - unwrap<quick_csv::Row,quick_csv::error::Error> at C:\bot\slave\stable-dist-rustc-win-msvc-64\build\src\libcore\result.rs:726 9: 0x13ffbc61b - utf8_bom_quotes at C:\Users\markus\Documents\Coding\rust\quick-csv\src\test.rs:218 10: 0x13ffe27da - test::stats::Summary::new::h8ad295300b5787a1 11: 0x13ffe5642 - test::stats::Summary::new::h8ad295300b5787a1 12: 0x140014d81 - _rust_maybe_catch_panic 13: 0x13ffe5a8c - test::stats::Summary::new::h8ad295300b5787a1 14: 0x140010373 - std::sys::thread::Thread::new::hf784252a6356c9b9 15: 0x76c059cc - BaseThreadInitThunk

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tafia/quick-csv/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAszhWupnWYWirPAQgAaOLq0etAUbFEks5qtWhogaJpZM4KFtSM .

msiemens commented 8 years ago

No problem! I'd love to provide a PR but I'm unsure how to solve this best. I think I'll submit a PR tomorrow and then iterate based on your feedback :) BTW thank you for having written quick-csv (and quick-xml too!), they both are amazing libraries to work with :)