infinyon / fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
https://www.fluvio.io/
Apache License 2.0
3.88k stars 491 forks source link

[Bug]: Consuming raw bytes with CLI return incorrect data #3967

Open nacardin opened 6 months ago

nacardin commented 6 months ago

Steps to reproduce:

  1. Create a file test.bin with the following bytes 0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, 0xF0, 0x90, 0x80, 0x57, 0x6F, 0x72, 0x6C, 0x64. This is from https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy "Incorrect bytes" example
    $ fluvio produce bin-test --raw -f test.bin
    $ fluvio consume -B bin-test -O raw > test-output.bin

In the test-output.bin file, the bytes will be 0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, 0xEF, 0xBF, 0xBD, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x0A. 0xF0, 0x90, 0x80 is replaced by 0xEF, 0xBF, 0xBD.

I believe this happens to the inappropriate use of from_utf8_lossy in https://github.com/infinyon/fluvio/blob/29d1a11cdbb7976fc3e0c6934d2d981eb3af49ce/crates/fluvio-cli/src/client/consume/record_format.rs#L77

This is binary data, it should not be parsed as utf8.

github-actions[bot] commented 4 months ago

Stale issue message